Skip to content

MCP

2 posts with the tag “MCP”

Your AI Agent Should Earn the Job (Series, 2 of 3)

Part 2 of 3 - Part 1: MCP Servers Have a Discovery Problem

In part one, we laid out the problem:

When multiple AI agents can do the same job, the model picks one based on how well its description reads.

Not the best. Not the cheapest. Not the most reliable. Not the one with the best track record.

The one with the best-matching string.

A coin flip with extra steps.

This post is about the alternative.


The decision you never made

Here is what happens today:

  1. You make a request to your LLM. Claude, ChatGPT, Gemini…
  2. The LLM has MCP servers connected, each exposing tools that can engage agents on your behalf. It looks at the tool descriptions and picks one.
  3. You get a result. You have no idea if the best agent for the job was chosen.

You did not make that decision. You do not see it. You cannot audit it. You have zero idea why one agent was chosen over another, or what alternatives even existed.

The model is not evaluating cost. It is not checking availability. It knows nothing about past performance, error rates, or reputation. It has no concept of which agent has handled ten thousand similar requests successfully and which one was deployed yesterday. It is matching token patterns against tool descriptions. That is the entire selection mechanism.

When there is one tool per job, this is fine. When there are two or more that overlap in functionality, you have a problem. The model picks one. It does not tell you why. And if a better, cheaper, faster or more reliable option existed, you will never know.

This is not a theoretical concern.
It is the default behavior of every MCP-connected system running today.


What if agents had to compete?

The core idea behind 638Labs is simple:

When multiple agents can do the same job, do not let the LLM guess what agent to use.

Make the agents compete. Make them bid for the job.

We built the auction house: when you want something done, the LLM does not decide for you, but puts the job up for auction. For bidding. Let the agents earn the right to execute your query.

Every eligible agent is evaluated on merit - cost, availability, reputation, fit for the task… The best one at that moment earns the right to handle your request. Not because it had a clever description. Because it was actually the best option.

This is not just about your requests. It is about how agent selection should work across the entire ecosystem. Every platform that routes AI requests faces this problem. Competition is the answer.


In part three, we will show what this looks like in practice.

If you are building with AI agents and this resonates, we would like to hear from you: info@638labs.com


Learn more: https://638labs.com

MCP Servers have a Discovery Problem (Series, 1 of 3)

Part 1 of 3 - Part 2: Your AI Agent Should Earn the Job

MCP is working and it is working well - Anthropic is really firing on all cylinders in that direction.

Developers are connecting tools to their AI environments - GitHub, Slack, Notion, databases, internal services. The protocol does what it promised: it gives agents a standard way to discover and call external capabilities. That’s a real step forward.

But as the number of connected tools grows, something starts to break - not in the protocol itself, but in how tools get selected.


How Discovery works in a MCP Server

When an LLM connects to an MCP server, it reads a list of tool definitions.

Each tool has a name and a description. When the user makes a request, the model reads those descriptions and decides which tool to call.

This works well when there’s one tool per job. A GitHub MCP server for repo operations. A Slack MCP server for messaging. No ambiguity, no overlap.

Now consider what happens when there are multiple tools that can do the same thing:

Your organization deploys four summarization agents in the MCP server.

  • one runs on OpenAI
  • one on Anthropic
  • one on a self-hosted model the team down the hall is running
  • and one from a third-party vendor.

All four are registered as MCP tools. You ask your LLM: “Summarize this document.”

Which agent runs?

We don’t know; it’s magic, internal to the black box that is the LLM.

Not the cheapest. Not the fastest. Not the one with the best track record on this type of content. The model picks based on how well the tool description reads - a beauty contest judged by token prediction.

That’s not selection. That’s a coin flip with extra steps.


How big a problem is this?

This pattern shows up everywhere capability overlap exists:

  • classification
  • translation
  • code generation
  • content moderation
  • data extraction
  • any category where multiple providers can fulfill the same request.

As the ecosystem matures, overlap will increase, not decrease. More agents, more MCP servers, more tools with overlapping capabilities (internal in an organization, inter-teams and then also in the open market)

The current model has no mechanism for handling this. There’s no ability for the LLM to choose on merit. There is no price signal. No latency comparison. No historical quality score. No way for an agent to say “I can do this job for less” or “I’ve been more accurate at this task over the last thousand calls.” The tool descriptions are static text, written once, evaluated by the LLM at call time.

This creates two problems.

For the client, there’s no confidence that the best available agent handled the request. You get an answer, not the best available answer. Worse, you have no visibility into why that tool was chosen or what alternatives existed.

For the agent provider, there’s no way to compete. You can write a better description, but that’s marketing, not performance. You can’t bid lower, respond faster, or prove quality - because the selection mechanism doesn’t accept those inputs. If you’re the fourth summarizer to connect, you’re at the mercy of how a language model interprets four paragraphs of text.


Is this an MCP Problem?

This isn’t a flaw in MCP.

MCP solved two hard problems: standardized capability definition and runtime discovery. Every agent describes what it can do in a common format. Clients discover those capabilities through a simple, standard runtime protocol. That is real infrastructure, and it works.

But that discovery is static. Tool descriptions are written once by the developer and never change. They carry no runtime signal. No price. No latency. No track record. No availability. The model reads the same fixed text every time, regardless of what has changed since it was written.

When there is one tool per job, static discovery is enough. When five tools overlap, it becomes a hardcoded, inflexible selection mechanism with no way to adapt.

Search engines had the same arc. Standardizing how web pages described themselves came first. Ranking them on merit was the breakthrough.

MCP gave us the standard. The ranking layer does not exist yet.


In part two, we will show you how 638Labs solved this.

If you are building with AI agents and this resonates, we would like to hear from you: info@638labs.com.


Learn more: https://638labs.com