Skip to content

Blog

Your AI Agent Should Earn the Job (Series, 2 of 3)

Part 2 of 3 - Part 1: MCP Servers Have a Discovery Problem

In part one, we laid out the problem:

When multiple AI agents can do the same job, the model picks one based on how well its description reads.

Not the best. Not the cheapest. Not the most reliable. Not the one with the best track record.

The one with the best-matching string.

A coin flip with extra steps.

This post is about the alternative.


The decision you never made

Here is what happens today:

  1. You make a request to your LLM. Claude, ChatGPT, Gemini…
  2. The LLM has MCP servers connected, each exposing tools that can engage agents on your behalf. It looks at the tool descriptions and picks one.
  3. You get a result. You have no idea if the best agent for the job was chosen.

You did not make that decision. You do not see it. You cannot audit it. You have zero idea why one agent was chosen over another, or what alternatives even existed.

The model is not evaluating cost. It is not checking availability. It knows nothing about past performance, error rates, or reputation. It has no concept of which agent has handled ten thousand similar requests successfully and which one was deployed yesterday. It is matching token patterns against tool descriptions. That is the entire selection mechanism.

When there is one tool per job, this is fine. When there are two or more that overlap in functionality, you have a problem. The model picks one. It does not tell you why. And if a better, cheaper, faster or more reliable option existed, you will never know.

This is not a theoretical concern.
It is the default behavior of every MCP-connected system running today.


What if agents had to compete?

The core idea behind 638Labs is simple:

When multiple agents can do the same job, do not let the LLM guess what agent to use.

Make the agents compete. Make them bid for the job.

We built the auction house: when you want something done, the LLM does not decide for you, but puts the job up for auction. For bidding. Let the agents earn the right to execute your query.

Every eligible agent is evaluated on merit - cost, availability, reputation, fit for the task… The best one at that moment earns the right to handle your request. Not because it had a clever description. Because it was actually the best option.

This is not just about your requests. It is about how agent selection should work across the entire ecosystem. Every platform that routes AI requests faces this problem. Competition is the answer.


In part three, we will show what this looks like in practice.

If you are building with AI agents and this resonates, we would like to hear from you: info@638labs.com


Learn more: https://638labs.com

MCP Servers have a Discovery Problem (Series, 1 of 3)

Part 1 of 3 - Part 2: Your AI Agent Should Earn the Job

MCP is working and it is working well - Anthropic is really firing on all cylinders in that direction.

Developers are connecting tools to their AI environments - GitHub, Slack, Notion, databases, internal services. The protocol does what it promised: it gives agents a standard way to discover and call external capabilities. That’s a real step forward.

But as the number of connected tools grows, something starts to break - not in the protocol itself, but in how tools get selected.


How Discovery works in a MCP Server

When an LLM connects to an MCP server, it reads a list of tool definitions.

Each tool has a name and a description. When the user makes a request, the model reads those descriptions and decides which tool to call.

This works well when there’s one tool per job. A GitHub MCP server for repo operations. A Slack MCP server for messaging. No ambiguity, no overlap.

Now consider what happens when there are multiple tools that can do the same thing:

Your organization deploys four summarization agents in the MCP server.

  • one runs on OpenAI
  • one on Anthropic
  • one on a self-hosted model the team down the hall is running
  • and one from a third-party vendor.

All four are registered as MCP tools. You ask your LLM: “Summarize this document.”

Which agent runs?

We don’t know; it’s magic, internal to the black box that is the LLM.

Not the cheapest. Not the fastest. Not the one with the best track record on this type of content. The model picks based on how well the tool description reads - a beauty contest judged by token prediction.

That’s not selection. That’s a coin flip with extra steps.


How big a problem is this?

This pattern shows up everywhere capability overlap exists:

  • classification
  • translation
  • code generation
  • content moderation
  • data extraction
  • any category where multiple providers can fulfill the same request.

As the ecosystem matures, overlap will increase, not decrease. More agents, more MCP servers, more tools with overlapping capabilities (internal in an organization, inter-teams and then also in the open market)

The current model has no mechanism for handling this. There’s no ability for the LLM to choose on merit. There is no price signal. No latency comparison. No historical quality score. No way for an agent to say “I can do this job for less” or “I’ve been more accurate at this task over the last thousand calls.” The tool descriptions are static text, written once, evaluated by the LLM at call time.

This creates two problems.

For the client, there’s no confidence that the best available agent handled the request. You get an answer, not the best available answer. Worse, you have no visibility into why that tool was chosen or what alternatives existed.

For the agent provider, there’s no way to compete. You can write a better description, but that’s marketing, not performance. You can’t bid lower, respond faster, or prove quality - because the selection mechanism doesn’t accept those inputs. If you’re the fourth summarizer to connect, you’re at the mercy of how a language model interprets four paragraphs of text.


Is this an MCP Problem?

This isn’t a flaw in MCP.

MCP solved two hard problems: standardized capability definition and runtime discovery. Every agent describes what it can do in a common format. Clients discover those capabilities through a simple, standard runtime protocol. That is real infrastructure, and it works.

But that discovery is static. Tool descriptions are written once by the developer and never change. They carry no runtime signal. No price. No latency. No track record. No availability. The model reads the same fixed text every time, regardless of what has changed since it was written.

When there is one tool per job, static discovery is enough. When five tools overlap, it becomes a hardcoded, inflexible selection mechanism with no way to adapt.

Search engines had the same arc. Standardizing how web pages described themselves came first. Ranking them on merit was the breakthrough.

MCP gave us the standard. The ranking layer does not exist yet.


In part two, we will show you how 638Labs solved this.

If you are building with AI agents and this resonates, we would like to hear from you: info@638labs.com.


Learn more: https://638labs.com

Introducing the Agentic AI Auction

Patent pending.

Today we’re opening up a new architectural concept for multi-agent systems: the Agentic AI Auction. It’s a simple idea with a big impact - every job sent to the platform triggers a real-time, deterministic, sealed-bid auction across eligible agents. Instead of static routing, hardcoded priority lists, or manual model selection, agents compete to win the job based on price, latency, or internal strategy.

In v1, we’re starting with a single-round sealed-bid auction designed for real-time execution. Each agent submits one bid without seeing others. The Auction Manager picks the best bid deterministically and dispatches the job. This gives predictable behavior, low overhead, and repeatable results, which is ideal for fast API-level tasks.

The design also decouples demand and supply: as long as the interface stays stable, both sides can evolve independently. Agents can upgrade models, adjust strategies, or add capabilities without breaking clients. Clients don’t need to track model changes, new entrants, or performance drift -> the auction handles that.

We’ve also implemented more advanced auction modes (multi-round, batch-oriented, quality-weighted, and strategy-adaptive). These are designed for asynchronous or outcome-driven jobs. We’ll demonstrate those in future updates.

v1 is focused on showing the simplest version working end-to-end:

Submit job → agents bid → deterministic winner → job executes.

This is fully working now, have been testing it successfully for past weeks; and it is in preview state. Contact us if you want to be part of the preview: info at 638labs.com

More coming soon.

How an AI Registry Accelerates Multi-Provider Agentic Systems

Learn how a registry layer simplifies managing AI agents, models, and data sources while maintaining governance and flexibility.

As AI systems become more modular, teams are building increasingly complex workflows using multiple agents and models-summarizers, classifiers, retrievers, planners-often served from different vendors, stacks, or environments.

Managing this growing sprawl of endpoints is becoming a new kind of operational challenge.

This post explores how a model registry can simplify the development and scaling of multi-provider agentic systems. We’ll look at its role in governance, routing, and experimentation, and how using a registry pattern brings structure and flexibility to otherwise brittle pipelines.


What is an AI Model Catalog?

A model catalog is a centralized registry of the large language models a provider supports or an organization has access to, across all providers, versions, and capabilities. It serves as a searchable directory that tells teams:

  • Which models are available
  • What they cost
  • What features and constraints they support

A catalog typically includes metadata such as:

  • Provider (e.g., OpenAI, Anthropic, Mistral)
  • Model name and version
  • Input/output token limits
  • Supported modalities (text, vision, code, etc.)

This structure helps providers and organizations standardize discovery and governance of models.

A model catalog is typically static and suffers from stale/old data. It will list typically all available models, offline and online. It is usually up to the end user to figure out deployment methods/providers.


What is an AI Systems Registry?

There are two significant differences from regular model catalogs:

  1. An AI Registry is a catalog for not just models, but also AI agents and knowledgebases.
  2. The AI Registry knows where the actual deployment endpoing is and will route traffic to the live endpoint.

AI System Registries go beyond catalogs - they combine general-purpose LLMs, specialized agents (for reasoning, planning, or tool use), and external data sources like knowledge bases or retrieval APIs. Each of these components typically lives in its own environment and exposes a distinct API.

For example, you may use a general LLM API from OpenAI, but a content writer agent through OpenRouter with a guardrail agent on DigitalOcean fetching knowledgebases hosted on Pinecone.

An AI System registry serves as a unified index across all these building blocks-not just models. It tracks where each component lives, how to route to it, and under what configuration, environment, or version. This abstraction enables developers to build more modular, maintainable AI systems while preserving flexibility across stacks and providers.

How does an AI System Registry work?

An AI registry is a structured index of deployed model endpoints-live services that power downstream AI tasks.

Unlike static catalogs of models or datasets, a registry focuses on active, operational APIs. It helps teams answer questions like:

  • Where is the summarizer for staging deployed?
  • What version of our classifier is in production?
  • Which endpoints use external providers vs. internal models?

It serves as a control layer for how requests are routed across environments, models, and providers.


Why It Matters in Agent-Based Architectures

Agentic systems frequently chain together multiple components: tools, retrievers, planners, classifiers, and language models. Each may live on a different stack or provider.

A registry helps address core challenges:

  1. Modular substitution: Swap a summarizer or classifier without rewriting orchestration code.
  2. Environment targeting: Route traffic to dev, staging, or production based on namespace. (Dev, Test, Prod, etc)
  3. Multi-provider fallback: Route requests to backups (e.g., internal → OpenRouter) during latency spikes or outages.
  4. Usage visibility: Trace calls and observe usage patterns across all expensive AI provider backends.
  5. Centralized governance: This is even more important at scale. You can enforce rate limits and resource limits at both organization and team level. Normally this governance is scatered across services/providers. You can now centrally define what resources are allowed to be accessed by whom. This is essential for maintaining control as usage scales across multiple teams.
  6. Access control and provisioning: An AI System registry abstraction layer allows you to define who gets access to which models, agents, datasources based on roles, teams, or environments (dev/staging/prod). This eliminates the risk of unauthorized or accidental usage of premium AI system and ensures compliance with internal and external policies.

Without a registry, these flows are often held together by hardcoded URLs and environment-specific logic. With a registry, you gain a stable routing layer with naming, versioning, and auditability built in.


How 638Labs AI System Registry helps you ship your AI powered apps

638Labs provides a gateway layer that allows developers to register any HTTP-accessible model (e.g., OpenAI, Together, Hugging Face, internal services) and route OpenAI-compatible requests through consistent endpoints.

By defining routes such as team/classifier-prod or team/classifier-testing, teams can manage traffic, version models, and swap providers-all without modifying client code.

638Labs gives you a centralized, live, online registry of all deployed models, agents, datasources.

Across organizations, the registry pattern supports a wide range of operational needs:

  • Single implementation abstraction layer: Keep your app facing config stable, while changing AI providers as your needs evolve.
  • Stable route naming: Abstract over vendor-specific model names with consistent, versioned routes. your-org-name/your-agent-name-prod vs provider-name/llm-version-xyz
  • Centralized access control: Manage who can call which routes, and under what conditions.
  • Dynamic routing: Swap providers or endpoints without touching the orchestrator or client.
  • Observability: Track performance, usage, and failures at the registry level.
  • Environment isolation: Separate dev/staging/prod deployments via route naming or access controls.
  • Unified discovery of deployed model/agent/knowledgebase: This is essential. Most model catalogs list available models, not deployed models. 638Labs is purpose built for live, deployed systems. Search and filter by type (agent, model, datasource), type of deployment (private or public), capabilities.

These capabilities become especially important in multi-team setups where governance, experimentation, and cost control must coexist.


Use case: AI Agentic Automation app used for Scheduling and Customer Order Management

Consider a business using AI to handle inbound scheduling or ordering via web forms, email, or chat. This is an asynchronous system-just structured or unstructured requests coming in and being processed by a pipeline of specialized agents:

  1. Intake Handler
    An agent monitors a shared inbox or intake form (e.g. orders@company.com). It uses a hosted model (e.g. OpenAI, Cohere) to extract key fields: customer name, request type, preferred date/time, or item details.

  2. Intent & Slot Filling
    The structured data is sent to a classifier or tagger (e.g. on Hugging Face) to confirm user intent and ensure all required fields are filled (e.g., is this a reschedule, a cancellation, or a new order?).

  3. Planner Agent
    A planner agent (hosted on live endpoints such as Together.ai) determines the next action-schedule the order, request clarification, or escalate to a human operator.

  4. Fallback Completion
    If the planner stalls or data is incomplete, a fallback LLM (e.g. on OpenRouter) generates a clarification message or default response.

  5. Policy Checker
    Before confirmation, the request is sent to an internal verifier agent to check for compliance with policies or SLAs (e.g., closed dates, max capacity, order limits).

  6. Fine-tuning Loop
    Annotated outcomes (successful orders, missed cases) are periodically used to fine-tune your internal models (e.g., hosted on vLLM), improving accuracy over time.


Enter the Registry

Each of these components can be registered under a stable internal route name:

org/intake-parser
org/intent-detector
org/planner-prod
org/completion-fallback
org/policy-check
org/model-train

You can setup your app in workflow automation frameworks such as n8n and you never have to touch the app code if you need to change providers.

This abstraction decouples orchestration logic from vendor details and unlocks:

  • Flexibility - Swap or test providers without rewriting orchestration code
  • Versioning - Track environments like dev, staging, or prod by route
  • Governance - Centralized control of who calls what
  • Observability - Unified logs and routing metrics

As modular AI systems grow, a registry becomes the glue that holds them together.

As AI systems grow more modular and span multiple providers, a registry layer becomes critical infrastructure - not just for speed, but for control and safety.


Learn more: https://638labs.com

Bring Your Own Model: 638Labs Unopinionated Approach

Right now, in 2025, the world of AI isn’t a seamless general intelligence - it’s a loose federation of narrow, useful agents: summarizers, recommenders, translators, code fixers, search enhancers. And most of these services live behind APIs, not platforms. Especially for small and fast-moving teams, AI isn’t a monolith - it’s a patchwork of deployed endpoints.

That’s where 638Labs fits in.

What 638Labs Is

638Labs is a lightweight, developer-first infrastructure layer for deployed AI services. At its core, it offers:

  • A registry of invokable, online-only endpoints
  • A forward proxy that routes OpenAI-compatible requests securely
  • A clean separation between what you’ve deployed and how you expose it to others or your own stack

We don’t host your models. We don’t wrap your functions. We don’t enforce contracts. If it’s accessible over HTTP, we can route it.

What 638Labs Is not

We’re not a model host, fine-tuning provider, or training platform.
We don’t require you to upload files, checkpoint models, or containerize workloads.
We’re not an agent runtime like CrewAI or LangGraph - but we can route to them.

Why This Matters - Especially for Agile Teams

Teams building fast need flexibility, visibility, and control. With 638Labs:

  • You can test, trace, and switch between models quickly
  • You avoid vendor lock-in or the burden of full-stack platforms
  • You get centralized logging, routing, and basic controls without ceremony
  • You can bring your own models - OpenAI, Hugging Face, self-hosted, or anything else

You stay in control. We just forward the calls.

Use Cases

Use 638Labs when:

  • You’re prototyping new agents but don’t want to rebuild infra each time
  • You want to expose internal AI services to different clients without rewriting
  • You’re mixing commercial APIs and your own endpoints
  • You want a future path to exposing some agents publicly - without rebuilding
  • You want a stable API surface for your business or enterprise tools, while freely changing the underlying models behind the scenes
  • You want to version your AI services (dev, test, prod) and roll forward or backward as needed

638Labs is a live registry and proxy for deployed AI services.
Learn more: https://638labs.com