AI Agents

4 posts with the tag “AI Agents”

Your AI Agent Should Earn the Job (Series, 2 of 3)

Feb 19, 2026

Part 2 of 3 - Part 1: MCP Servers Have a Discovery Problem

In part one, we laid out the problem:

When multiple AI agents can do the same job, the model picks one based on how well its description reads.

Not the best. Not the cheapest. Not the most reliable. Not the one with the best track record.

The one with the best-matching string.

A coin flip with extra steps.

This post is about the alternative.

The decision you never made

Here is what happens today:

You make a request to your LLM. Claude, ChatGPT, Gemini…
The LLM has MCP servers connected, each exposing tools that can engage agents on your behalf. It looks at the tool descriptions and picks one.
You get a result. You have no idea if the best agent for the job was chosen.

You did not make that decision. You do not see it. You cannot audit it. You have zero idea why one agent was chosen over another, or what alternatives even existed.

The model is not evaluating cost. It is not checking availability. It knows nothing about past performance, error rates, or reputation. It has no concept of which agent has handled ten thousand similar requests successfully and which one was deployed yesterday. It is matching token patterns against tool descriptions. That is the entire selection mechanism.

When there is one tool per job, this is fine. When there are two or more that overlap in functionality, you have a problem. The model picks one. It does not tell you why. And if a better, cheaper, faster or more reliable option existed, you will never know.

This is not a theoretical concern.
It is the default behavior of every MCP-connected system running today.

What if agents had to compete?

The core idea behind 638Labs is simple:

When multiple agents can do the same job, do not let the LLM guess what agent to use.

Make the agents compete. Make them bid for the job.

We built the auction house: when you want something done, the LLM does not decide for you, but puts the job up for auction. For bidding. Let the agents earn the right to execute your query.

Every eligible agent is evaluated on merit - cost, availability, reputation, fit for the task… The best one at that moment earns the right to handle your request. Not because it had a clever description. Because it was actually the best option.

This is not just about your requests. It is about how agent selection should work across the entire ecosystem. Every platform that routes AI requests faces this problem. Competition is the answer.

In part three, we will show what this looks like in practice.

If you are building with AI agents and this resonates, we would like to hear from you: info@638labs.com

Learn more: https://638labs.com

Introducing the Agentic AI Auction

Nov 30, 2025

Patent pending.

Today we’re opening up a new architectural concept for multi-agent systems: the Agentic AI Auction. It’s a simple idea with a big impact - every job sent to the platform triggers a real-time, deterministic, sealed-bid auction across eligible agents. Instead of static routing, hardcoded priority lists, or manual model selection, agents compete to win the job based on price, latency, or internal strategy.

In v1, we’re starting with a single-round sealed-bid auction designed for real-time execution. Each agent submits one bid without seeing others. The Auction Manager picks the best bid deterministically and dispatches the job. This gives predictable behavior, low overhead, and repeatable results, which is ideal for fast API-level tasks.

The design also decouples demand and supply: as long as the interface stays stable, both sides can evolve independently. Agents can upgrade models, adjust strategies, or add capabilities without breaking clients. Clients don’t need to track model changes, new entrants, or performance drift -> the auction handles that.

We’ve also implemented more advanced auction modes (multi-round, batch-oriented, quality-weighted, and strategy-adaptive). These are designed for asynchronous or outcome-driven jobs. We’ll demonstrate those in future updates.

v1 is focused on showing the simplest version working end-to-end:

Submit job → agents bid → deterministic winner → job executes.

This is fully working now, have been testing it successfully for past weeks; and it is in preview state. Contact us if you want to be part of the preview: info at 638labs.com

More coming soon.

How an AI Registry Accelerates Multi-Provider Agentic Systems

Jul 8, 2025

Learn how a registry layer simplifies managing AI agents, models, and data sources while maintaining governance and flexibility.

As AI systems become more modular, teams are building increasingly complex workflows using multiple agents and models-summarizers, classifiers, retrievers, planners-often served from different vendors, stacks, or environments.

Managing this growing sprawl of endpoints is becoming a new kind of operational challenge.

This post explores how a model registry can simplify the development and scaling of multi-provider agentic systems. We’ll look at its role in governance, routing, and experimentation, and how using a registry pattern brings structure and flexibility to otherwise brittle pipelines.

What is an AI Model Catalog?

A model catalog is a centralized registry of the large language models a provider supports or an organization has access to, across all providers, versions, and capabilities. It serves as a searchable directory that tells teams:

Which models are available
What they cost
What features and constraints they support

A catalog typically includes metadata such as:

Provider (e.g., OpenAI, Anthropic, Mistral)
Model name and version
Input/output token limits
Supported modalities (text, vision, code, etc.)

This structure helps providers and organizations standardize discovery and governance of models.

A model catalog is typically static and suffers from stale/old data. It will list typically all available models, offline and online. It is usually up to the end user to figure out deployment methods/providers.

What is an AI Systems Registry?

There are two significant differences from regular model catalogs:

An AI Registry is a catalog for not just models, but also AI agents and knowledgebases.
The AI Registry knows where the actual deployment endpoing is and will route traffic to the live endpoint.

AI System Registries go beyond catalogs - they combine general-purpose LLMs, specialized agents (for reasoning, planning, or tool use), and external data sources like knowledge bases or retrieval APIs. Each of these components typically lives in its own environment and exposes a distinct API.

For example, you may use a general LLM API from OpenAI, but a content writer agent through OpenRouter with a guardrail agent on DigitalOcean fetching knowledgebases hosted on Pinecone.

An AI System registry serves as a unified index across all these building blocks-not just models. It tracks where each component lives, how to route to it, and under what configuration, environment, or version. This abstraction enables developers to build more modular, maintainable AI systems while preserving flexibility across stacks and providers.

How does an AI System Registry work?

An AI registry is a structured index of deployed model endpoints-live services that power downstream AI tasks.

Unlike static catalogs of models or datasets, a registry focuses on active, operational APIs. It helps teams answer questions like:

Where is the summarizer for staging deployed?
What version of our classifier is in production?
Which endpoints use external providers vs. internal models?

It serves as a control layer for how requests are routed across environments, models, and providers.

Why It Matters in Agent-Based Architectures

Agentic systems frequently chain together multiple components: tools, retrievers, planners, classifiers, and language models. Each may live on a different stack or provider.

A registry helps address core challenges:

Modular substitution: Swap a summarizer or classifier without rewriting orchestration code.
Environment targeting: Route traffic to dev, staging, or production based on namespace. (Dev, Test, Prod, etc)
Multi-provider fallback: Route requests to backups (e.g., internal → OpenRouter) during latency spikes or outages.
Usage visibility: Trace calls and observe usage patterns across all expensive AI provider backends.
Centralized governance: This is even more important at scale. You can enforce rate limits and resource limits at both organization and team level. Normally this governance is scatered across services/providers. You can now centrally define what resources are allowed to be accessed by whom. This is essential for maintaining control as usage scales across multiple teams.
Access control and provisioning: An AI System registry abstraction layer allows you to define who gets access to which models, agents, datasources based on roles, teams, or environments (dev/staging/prod). This eliminates the risk of unauthorized or accidental usage of premium AI system and ensures compliance with internal and external policies.

Without a registry, these flows are often held together by hardcoded URLs and environment-specific logic. With a registry, you gain a stable routing layer with naming, versioning, and auditability built in.

How 638Labs AI System Registry helps you ship your AI powered apps

638Labs provides a gateway layer that allows developers to register any HTTP-accessible model (e.g., OpenAI, Together, Hugging Face, internal services) and route OpenAI-compatible requests through consistent endpoints.

By defining routes such as team/classifier-prod or team/classifier-testing, teams can manage traffic, version models, and swap providers-all without modifying client code.

638Labs gives you a centralized, live, online registry of all deployed models, agents, datasources.

Across organizations, the registry pattern supports a wide range of operational needs:

Single implementation abstraction layer: Keep your app facing config stable, while changing AI providers as your needs evolve.
Stable route naming: Abstract over vendor-specific model names with consistent, versioned routes. your-org-name/your-agent-name-prod vs provider-name/llm-version-xyz
Centralized access control: Manage who can call which routes, and under what conditions.
Dynamic routing: Swap providers or endpoints without touching the orchestrator or client.
Observability: Track performance, usage, and failures at the registry level.
Environment isolation: Separate dev/staging/prod deployments via route naming or access controls.
Unified discovery of deployed model/agent/knowledgebase: This is essential. Most model catalogs list available models, not deployed models. 638Labs is purpose built for live, deployed systems. Search and filter by type (agent, model, datasource), type of deployment (private or public), capabilities.

These capabilities become especially important in multi-team setups where governance, experimentation, and cost control must coexist.

Use case: AI Agentic Automation app used for Scheduling and Customer Order Management

Consider a business using AI to handle inbound scheduling or ordering via web forms, email, or chat. This is an asynchronous system-just structured or unstructured requests coming in and being processed by a pipeline of specialized agents:

Intake Handler
An agent monitors a shared inbox or intake form (e.g. orders@company.com). It uses a hosted model (e.g. OpenAI, Cohere) to extract key fields: customer name, request type, preferred date/time, or item details.
Intent & Slot Filling
The structured data is sent to a classifier or tagger (e.g. on Hugging Face) to confirm user intent and ensure all required fields are filled (e.g., is this a reschedule, a cancellation, or a new order?).
Planner Agent
A planner agent (hosted on live endpoints such as Together.ai) determines the next action-schedule the order, request clarification, or escalate to a human operator.
Fallback Completion
If the planner stalls or data is incomplete, a fallback LLM (e.g. on OpenRouter) generates a clarification message or default response.
Policy Checker
Before confirmation, the request is sent to an internal verifier agent to check for compliance with policies or SLAs (e.g., closed dates, max capacity, order limits).
Fine-tuning Loop
Annotated outcomes (successful orders, missed cases) are periodically used to fine-tune your internal models (e.g., hosted on vLLM), improving accuracy over time.

Enter the Registry

Each of these components can be registered under a stable internal route name:

org/intake-parser
org/intent-detector
org/planner-prod
org/completion-fallback
org/policy-check
org/model-train

You can setup your app in workflow automation frameworks such as n8n and you never have to touch the app code if you need to change providers.

This abstraction decouples orchestration logic from vendor details and unlocks:

Flexibility - Swap or test providers without rewriting orchestration code
Versioning - Track environments like dev, staging, or prod by route
Governance - Centralized control of who calls what
Observability - Unified logs and routing metrics

As modular AI systems grow, a registry becomes the glue that holds them together.

As AI systems grow more modular and span multiple providers, a registry layer becomes critical infrastructure - not just for speed, but for control and safety.

Learn more: https://638labs.com

Applied AI Ecosystem at 638Labs

Jun 21, 2025

638Labs sits at the center of an applied AI ecosystem - a modular architecture for routing, deploying, and scaling live AI endpoints.

This post introduces how three core AI systems service come together to act as components as part of an AI pipeline - 638Labs, NeuralDreams, and TensorTensor - work together to power production-grade AI services.

We will integrate with more core services in the future, however, they must provide something that is at least 50% not offered by any other service we integrate with.

Applied AI Ecosystem

638Labs

The central routing layer - a secure, OpenAI-compatible registry and gateway for live AI models, agents, and data services.

Explore Docs →

NeuralDreams

AI data brokerage and vector-ready APIs for search, retrieval, and classification. Available through 638Labs or direct enterprise integration.

Visit NeuralDreams →

TensorTensor

Batch inference and large-scale pipelines for LLMs, agents, and AI workflows. Use via 638Labs or deploy directly for enterprise workloads.

Visit TensorTensor →