Book a Mission Brief

January 18, 2025

15 min read

Agents Don't Need Better Models. They Need Better Context.

There is a debate happening in enterprise AI that misses the point entirely.

[[divider]]

‍

The debate is about which model to use. GPT-5 or Claude or Gemini. Open source or proprietary. Fine-tuned or off-the-shelf. Companies spend months evaluating models, running benchmarks, negotiating contracts, and arguing about capability differences that amount to single-digit percentage points on standardized tests.

‍

Meanwhile, the actual bottleneck sits untouched.

‍

The bottleneck is not model capability. Every major foundation model released in 2025 is "good enough" for the vast majority of enterprise use cases. They can summarize documents, answer questions, write content, analyze data, and follow multi-step instructions. The differences between them matter at the frontier of research. They do not matter much for business applications.

‍

What matters is context. And context is where almost every enterprise AI deployment falls apart.

‍

[[divider]]

‍

The Context Gap

‍

Watch what happens when you deploy an AI agent into an actual enterprise workflow.

‍

The agent receives a query. It needs to help a sales rep prepare for a customer meeting. Simple enough.

‍

The agent can access the CRM. It pulls the customer record, the opportunity details, the contact information. It retrieves recent emails and maybe some meeting notes if they were logged.

‍

Now ask the agent: What should the sales rep know about this customer that is not obvious from the record?

‍

The agent cannot answer this. Not because the model is incapable of reasoning. Because the information does not exist in any system the agent can query.

‍

The fact that this customer almost churned last year, and the save involved a VP-level conversation about roadmap commitments? That lives in someone's memory and a Slack thread that was never linked to the account.

‍

The fact that the customer's procurement process is unusually slow, so the rep should start the renewal conversation two months earlier than normal? That is tribal knowledge passed between sales reps but never documented.

‍

The fact that the customer's champion just changed roles internally, and the new stakeholder has different priorities? That might be in LinkedIn, but the connection to what it means for this specific deal is not captured anywhere.

‍

The model could reason about all of this if it had the information. It does not have the information. The information was never treated as data.

‍

This is the context gap. The distance between what the AI can access and what a knowledgeable human would know.

‍

[[divider]]

‍

Why Better Models Do Not Fix This

‍

The instinct when AI underperforms is to upgrade the model. Get a more capable model. Fine-tune on domain-specific data. Add more retrieval sources.

‍

None of this addresses the context gap.

‍

A more capable model is still limited by what it can see. If the relevant context does not exist in queryable form, a smarter model just means you have a smarter system that is equally blind.

‍

Fine-tuning teaches the model patterns from historical data. It does not give the model access to the specific context of a specific situation in real time. A fine-tuned model might know that customers in this segment tend to have long procurement cycles. It still does not know that this specific customer's procurement lead is on parental leave and everything is delayed by six weeks.

‍

Adding more retrieval sources expands what the model can search. But if the information was never captured in any source, there is nothing to retrieve. You cannot RAG your way to context that does not exist as data.

‍

The context gap is not a model problem. It is a data architecture problem. You are asking AI to reason from incomplete information, then blaming the AI when its reasoning is incomplete.

‍

[[divider]]

‍

What Knowledgeable Humans Actually Know

‍

Consider what an experienced employee actually brings to a task.

‍

When a senior sales rep prepares for a customer meeting, they do not just read the CRM record. They remember the history. They know the personalities involved. They recall similar situations with similar customers and what worked. They have a mental model of how this customer makes decisions, which objections are real and which are negotiating tactics, what the customer actually cares about versus what they say they care about.

‍

This knowledge accumulates over years. It is built from direct experience, from conversations with colleagues, from observing outcomes, from pattern matching across hundreds of similar situations.

‍

None of this is captured in your systems. Your CRM captures the outcome of the deal. It does not capture the reasoning that got you there. Your support system captures the resolution of the ticket. It does not capture the judgment calls that determined how to prioritize and handle it.

‍

When you deploy an AI agent, you are asking it to perform at the level of a knowledgeable employee while giving it access only to the information a new hire would see on their first day.

‍

The new hire would fail. The AI fails for the same reason.

‍

[[divider]]

‍

The Real Capability Ceiling

‍

Here is what this means practically.

‍

Your AI agents have a capability ceiling, and the ceiling is not set by model intelligence. It is set by context availability.

‍

An agent with access to complete context and a mediocre model will outperform an agent with incomplete context and a state-of-the-art model. The context-rich agent can reason effectively because it has the information needed to reason. The context-poor agent cannot reason effectively no matter how capable its underlying model.

‍

This is why model benchmarks are poor predictors of enterprise deployment success. The benchmarks measure model capability in isolation. Enterprise success depends on model capability multiplied by context availability. If context availability is low, it does not matter how high model capability is. The product is still low.

‍

Most enterprises have extremely low context availability. Their systems capture states, not reasoning. Outcomes, not decisions. Current information, not historical patterns. The information that would allow AI to reason effectively was never treated as data worth capturing.

‍

So they keep upgrading models, expecting different results. They keep running benchmarks that tell them the new model is 12% better on some evaluation suite. They keep deploying and failing because the bottleneck was never the model.

‍

[[divider]]

‍

What Context-Rich Looks Like

‍

The difference between context-poor and context-rich is not subtle. It changes what AI can actually do.

‍

Context-Poor Agent Handling a Support Escalation:

‍

The agent sees the current ticket. It sees the customer's tier. It can look up the SLA terms. It retrieves some documentation about the product issue.

‍

It recommends following the standard escalation process. It estimates resolution time based on average metrics. It drafts a response using templates.

‍

The recommendation is generic. It does not account for the fact that this customer has escalated twice this quarter and is evaluating competitors. It does not know that the product team pushed a fix last week that might address this issue. It does not factor in that the assigned engineer just handled a similar case and knows exactly what to check.

‍

Context-Rich Agent Handling the Same Escalation:

‍

The agent sees the current ticket. It also sees the customer's full interaction history, including the churn risk flag from the CSM's notes last month. It sees that this is the third escalation in the quarter and the pattern of previous resolutions. It knows the customer's contract renewal is in six weeks.

‍

It sees that a related fix was deployed recently and can check whether this customer's environment has been updated. It sees which engineers have successfully resolved similar issues and their availability. It sees the precedent from a similar situation with a similar customer and what resolution approach worked.

‍

It recommends a specific approach: Assign to the engineer with relevant experience. Verify the recent fix was applied. Given the renewal timeline and escalation history, flag this for the CSM and consider whether a proactive executive touchpoint would help.

‍

The recommendation is specific, contextual, and actionable. It reflects the actual situation, not a generic category of situations.

‍

Same model. Different context. Completely different outcomes.

‍

[[divider]]

‍

Building Context Infrastructure

‍

The question is how to move from context-poor to context-rich.

‍

The answer is not complicated conceptually. It is hard operationally. You have to start capturing the information that was never captured before.

‍

Capture decisions, not just outcomes. When an exception gets approved, capture why. When an escalation gets resolved, capture the reasoning. When a deal closes, capture the factors that mattered. The outcome is already in your systems. The reasoning is what is missing.

‍

Capture cross-system context. The support ticket connects to the customer record connects to the contract connects to the product usage data connects to the sales conversation. These connections exist in reality but not in your data. Map them explicitly.

‍

Capture organizational knowledge. The tribal knowledge that experienced employees carry. The patterns that only become visible after handling hundreds of similar cases. The implicit rules that govern how work actually gets done. This has to move from human heads to queryable infrastructure.

‍

Capture decision traces. When an agent or human makes a decision, capture the inputs they considered, the reasoning they applied, and the outcome that resulted. Over time, this creates a precedent base that future decisions can reference.

‍

This is not a small project. It is not something you bolt onto existing systems in a sprint. It is a fundamental change in how you think about organizational data.

‍

But it is the change that determines whether AI actually works.

‍

[[divider]]

‍

The Compound Effect

‍

There is a compounding dynamic here that most organizations miss.

‍

When AI operates with rich context, it performs better. Better performance means more deployment. More deployment means more decisions being made through AI-mediated workflows. More decisions mean more opportunities to capture decision traces and reasoning. More captured reasoning means richer context for future AI operations.

‍

The flywheel spins. Context begets capability begets deployment begets context.

‍

The inverse is also true. When AI operates with poor context, it performs badly. Bad performance means failed pilots and abandoned projects. Failed projects mean no decision traces captured, no reasoning documented, no context accumulated. The next project starts from the same impoverished context baseline. It fails for the same reasons.

‍

This is why some organizations are pulling ahead while others keep failing. The leaders are building context infrastructure. The laggards are switching models.

‍

[[divider]]

‍

The Model Debate Is a Distraction

‍

Back to where we started.

‍

Companies are spending enormous energy debating which model to use. Which vendor to partner with. Whether to build or buy. Whether to fine-tune or prompt-engineer.

‍

These questions matter, but they are second-order. The first-order question is whether you have the context infrastructure that allows any model to perform well.

‍

If you do not have that infrastructure, it does not matter which model you choose. GPT-4 without context will fail. Claude without context will fail. Any model you can name without context will fail. The failure mode will be the same: the AI will not know things it needs to know to do its job well.

‍

If you do have that infrastructure, the model choice becomes much less critical. A reasonably capable model with excellent context will outperform a highly capable model with poor context. You can iterate on models later. You cannot iterate your way out of a missing context layer.

‍

The model debate is a distraction from the context problem. And the context problem is where enterprise AI success is actually determined.

‍

[[divider]]

‍

What This Means For You

‍

If your AI projects are underperforming, run a diagnostic.

‍

Take a task where AI is failing and trace the failure. Is the model unable to perform the reasoning required? Or is the model missing information it would need to reason well?

‍

In almost every enterprise case, the answer is the latter. The model could do the task if it had the right information. It does not have the right information.

‍

This means your investment priority is wrong. You are optimizing the model when you should be building the context layer. You are upgrading the engine when the problem is the fuel.

‍

The organizations that figure this out will pull ahead. The organizations that keep chasing better models will keep failing for reasons they do not understand.

‍

[[divider]]

RLTX builds context infrastructure for enterprise AI.

We focus on the layer that actually determines success: the unified data architecture that gives agents the organizational context they need to reason effectively.

Better models are available to everyone. Better context is a competitive advantage.