Agentic AI Security Architecture: The Wrong Answer

In Part 1 of this piece, I described what agentic AI attacks actually look like in practice; the digital factory model, where agents commit fraud, and the three properties that make agentic AI attackers categorically different from traditional bot tooling: autonomous iteration, session-to-session learning, and identity spoofing at the interaction layer.

Now I want to make the architecture argument. Because the security industry’s response to this threat is converging on a frame that I think has a specific, observable failure mode against exactly those three properties.

The Dominant Answer Is Identity. It Is Not Enough

The vendors with the most momentum in the agentic AI security space right now are the ones who’ve claimed “trust” as the answer. Classify agents. Verify intent. Establish a trust layer between your platform and the automation hitting it. At RSAC 2026, this translated into a wave of agent identity frameworks; systems that verify who the agent is, where it came from, what credentials it carries.

Post-conference analysis has since noted what was missing: every agent identity framework launched at RSAC verified who the agent was. None of them tracked what the agent did.

Agent identity is not agent behavior. This distinction matters enormously when you understand how agentic AI attackers actually operate. It is the distinction the industry needs to internalize before the category hardens around the wrong architecture.

I understand the appeal of the identity-first frame. It is clean, it maps to how security teams think about access control, and it is an easy story to tell: know what is in your traffic, verify its identity, decide what to trust, act accordingly.

But here is what the data shows happens when an identity-verified system encounters an agentic AI attacker with infinite patience and machine-speed iteration.

The first thing the attacker does is not launch an attack. The first thing they do is probe.

They send sessions designed to look legitimate and observe what passes. They vary timing, credential patterns, behavioral signals, iterating toward the profile of an authorized agent. Not once. Thousands of times, autonomously, learning from each response. A trust-based classification system is, for an agentic AI attacker, a target to be learned rather than a barrier to be respected. Given enough probe sessions, the decision boundary becomes legible. And once it is legible, it is exploitable.

This is the failure mode. Not a dramatic bypass. A systematic, patient process of mapping your classification model until its edges are found and its assumptions can be met. The attacker doesn’t need to defeat your verification. They need to pass it. With session-to-session learning and autonomous iteration, they will eventually figure out how.

Classification is not useless. We do it, and it matters. What I’m saying is that classification alone, without an economic layer enforcing consequences, is a single point of failure with no backstop.

Why We Built Around Economics, Not Verification

Economic deterrence in fraud prevention means imposing enough cost per attack attempt that running the campaign becomes unprofitable. Arkose Labs’ approach to agentic AI security is built on economic deterrence at the interaction layer — making attacks more expensive the more the attacker probes.

When we designed the challenge infrastructure at Arkose Labs, we started with a question most vendors in this space don’t ask: what happens when the attacker gets the classification right?

Because they will, eventually. Every model has an error rate. Every boundary can be found. If your entire security posture rests on classifying correctly, the question isn’t whether an attacker can defeat your classification, it is when, and at what cost.

This cost is the lever. It is the only lever that doesn’t have a failure mode.

If circumventing your security layer costs more, in compute time, API calls burned, labeling investment, and per-session friction, than the value of a successful breach, the attack stops. Not because it was caught. Not because it was classified correctly. Because the math does not work for the attacker anymore.

This is what economic disruption means in practice. Not blocking. Not verifying. Making the attack unprofitable.

Our research, across billions of challenge sessions across some of the world’s largest B2C platforms, consistently shows that economic deterrence is what ends campaigns, not detection alone. Attackers who are detected but not economically deterred adapt and return. Attackers who face escalating cost per attempt don’t.

The Arkose Labs 2026 Agentic AI Security Report makes the gap visible: 97% of enterprise leaders expect a material AI-agent-driven incident within 12 months, yet only 6% of security budgets are dedicated to tackling it. This is not a resourcing problem. It is an architectural one. Organizations are still betting on classification accuracy holding — and underinvesting in the economic layer that makes classification failures survivable.

The Interaction Layer Is Where This Gets Decided

The industry now broadly agrees that agent identity does not equal agent behavior. What it hasn’t answered is: where do you observe the behavior? How do you generate the signal that tells you what an agent is actually doing, not just who it claims to be?

As I described in my previous blog, The Attack Runs Itself: What Agentic AI Fraud Actually Looks Like, agentic AI attacks don’t happen at the network layer. They happen at the interaction layer: account creation, login, checkout, API endpoints. This is where the digital factory executes its attack chain. It is also where network-level classification and browser fingerprinting run out of signal.

Network-level signals tell you what something is. They don’t tell you what it is trying to do, how it behaves when pressured, or whether its interaction patterns are consistent with legitimate intent. Those signals only exist at the interaction layer, and they only exist if you have a mechanism that generates them.

The challenge layer is that mechanism. Every interaction with it produces behavioral signals: solve timing, answer patterns, failure signatures, interaction consistency, response to escalating pressure. This is a signal no passive detection approach can produce, because passive detection has nothing to observe until after the fact. It is the answer to the gap the industry identified at RSAC: not just knowing who the agent is, but observing what it does and imposing consequences when the behavior doesn’t match the claim.

This is a claim about architecture. The platforms that will handle agentic AI attacks durably are the ones with a mechanism at the interaction layer, not just identity verification or classification ahead of it.

The economic pressure that mechanism generates works even when classification is uncertain. This is the critical difference. A trust model needs to be right. An economic model needs to be expensive enough that being wrong doesn’t matter.

Putting It Into Practice

Understanding that the right frame is economic deterrence at the interaction layer raises a practical question: what does this look like in operation?

The first requirement is visibility. Security and fraud teams need to see agentic traffic as a distinct category, not as noise buried inside broader automated traffic metrics. This means knowing which agents are active, across which flows, in what volumes, and how that changes over time. It also means applying a three-tier view: the good agents that self-identify and generate value, the bad agents masquerading as legitimate while running fraud at machine speed, and the gray-area agents that are helpful to end users but present ambiguous intent to the platform. Visibility without this distinction is not actionable.

The second requirement is governance without engineering dependency. The ability to define what “authorized” means per endpoint and act on that definition directly, choosing to allow, monitor, challenge, or block traffic by agent type, risk score, and geography, needs to sit with security and fraud teams. Every deployment cycle and engineering ticket is time the campaign has to run. Mastercard research puts the cost of inaction in concrete terms: organizations lost an average of $60 million to payment fraud in the past year. The cost asymmetry is clear — if the economics of an attack favor the attacker, the campaign continues regardless of whether detection eventually catches up.

Arkose Labs’ platform provides both. The challenge infrastructure generates the interaction-layer signal that makes classification meaningful. The visibility and policy layers built on top of it translate this signal into decisions teams can act on without waiting for a code change.

A Word on the Dual-Use Reality

It is easy to misread an economics-first argument as “block everything that looks suspicious.” This is not what it means.

The vast majority of agentic AI traffic on most platforms is legitimate: payment processors, customer service agents, booking assistants, accessibility tools operating on behalf of real users. The goal is not to create friction for all automation. The goal is a three-tier agent classification framework: good agents, bad agents, and gray-area agents. Visibility into all three is the prerequisite for intelligent policy. Economic deterrence is what makes this policy durable.

Classification tells you where to apply economic pressure. The economic layer is what makes this pressure meaningful. You need both. The right architecture is one where they work together, not one where classification is the only line of defense.

The Architecture Question

If you’re evaluating agentic AI security solutions, the question I would encourage you to ask any vendor is not “how do you classify agents?” It is:

“What happens when your classification is wrong?”

If the answer is confidence in accuracy, this is not an answer. Every classification system fails. What matters is what the system does when it fails.

The right answer: when classification is uncertain, we impose cost. When an agent’s behavior deviates from its claimed authorization, we impose cost. When session signals are inconsistent with legitimate activity, we impose cost in a way that gets more expensive for the attacker the more they probe, rather than less expensive as they learn the system.

This is the architecture question. Not trust or no trust. Cost or no cost.

If the industry builds the agentic AI security category around trust-classification as the primary frame, we will be having the same conversation in three years, except the attacks will be more capable, the tooling will be freely available to anyone with a laptop, and the classification models will be further behind.

The durable answer is economics. It is what we build around. And it is what the category needs to converge on.

For the full picture on where agentic AI security budgets are falling short, download the 2026 Agentic AI Security Report.

The Agentic AI Security Category Is Converging on the Wrong Answer

The Dominant Answer Is Identity. It Is Not Enough

Why We Built Around Economics, Not Verification

The Interaction Layer Is Where This Gets Decided

Putting It Into Practice

A Word on the Dual-Use Reality

The Architecture Question

Share The Blog

Related Blogs

The Attack Runs Itself: What Agentic AI Fraud Actually Looks Like

97% of Enterprises Expect a Major AI Agent Security Incident Within the Year

The Financial Cost of Agentic AI Fraud