Skip to content

MCP Servers have a Discovery Problem (Series, 1 of 3)

Part 1 of 3 - Part 2: Your AI Agent Should Earn the Job

MCP is working and it is working well - Anthropic is really firing on all cylinders in that direction.

Developers are connecting tools to their AI environments - GitHub, Slack, Notion, databases, internal services. The protocol does what it promised: it gives agents a standard way to discover and call external capabilities. That’s a real step forward.

But as the number of connected tools grows, something starts to break - not in the protocol itself, but in how tools get selected.


How Discovery works in a MCP Server

When an LLM connects to an MCP server, it reads a list of tool definitions.

Each tool has a name and a description. When the user makes a request, the model reads those descriptions and decides which tool to call.

This works well when there’s one tool per job. A GitHub MCP server for repo operations. A Slack MCP server for messaging. No ambiguity, no overlap.

Now consider what happens when there are multiple tools that can do the same thing:

Your organization deploys four summarization agents in the MCP server.

  • one runs on OpenAI
  • one on Anthropic
  • one on a self-hosted model the team down the hall is running
  • and one from a third-party vendor.

All four are registered as MCP tools. You ask your LLM: “Summarize this document.”

Which agent runs?

We don’t know; it’s magic, internal to the black box that is the LLM.

Not the cheapest. Not the fastest. Not the one with the best track record on this type of content. The model picks based on how well the tool description reads - a beauty contest judged by token prediction.

That’s not selection. That’s a coin flip with extra steps.


How big a problem is this?

This pattern shows up everywhere capability overlap exists:

  • classification
  • translation
  • code generation
  • content moderation
  • data extraction
  • any category where multiple providers can fulfill the same request.

As the ecosystem matures, overlap will increase, not decrease. More agents, more MCP servers, more tools with overlapping capabilities (internal in an organization, inter-teams and then also in the open market)

The current model has no mechanism for handling this. There’s no ability for the LLM to choose on merit. There is no price signal. No latency comparison. No historical quality score. No way for an agent to say “I can do this job for less” or “I’ve been more accurate at this task over the last thousand calls.” The tool descriptions are static text, written once, evaluated by the LLM at call time.

This creates two problems.

For the client, there’s no confidence that the best available agent handled the request. You get an answer, not the best available answer. Worse, you have no visibility into why that tool was chosen or what alternatives existed.

For the agent provider, there’s no way to compete. You can write a better description, but that’s marketing, not performance. You can’t bid lower, respond faster, or prove quality - because the selection mechanism doesn’t accept those inputs. If you’re the fourth summarizer to connect, you’re at the mercy of how a language model interprets four paragraphs of text.


Is this an MCP Problem?

This isn’t a flaw in MCP.

MCP solved two hard problems: standardized capability definition and runtime discovery. Every agent describes what it can do in a common format. Clients discover those capabilities through a simple, standard runtime protocol. That is real infrastructure, and it works.

But that discovery is static. Tool descriptions are written once by the developer and never change. They carry no runtime signal. No price. No latency. No track record. No availability. The model reads the same fixed text every time, regardless of what has changed since it was written.

When there is one tool per job, static discovery is enough. When five tools overlap, it becomes a hardcoded, inflexible selection mechanism with no way to adapt.

Search engines had the same arc. Standardizing how web pages described themselves came first. Ranking them on merit was the breakthrough.

MCP gave us the standard. The ranking layer does not exist yet.


In part two, we will show you how 638Labs solved this.

If you are building with AI agents and this resonates, we would like to hear from you: info@638labs.com.


Learn more: https://638labs.com