CHAPTER 4

AI Redefines Threat Modeling

Author's Note

This chapter is written with security leaders in mind. The details may get complicated, but the core message is straightforward and disruptive: threat modeling for AI-driven systems is not just an extension of traditional practice — it's almost a different discipline altogether. Agentic software cannot be treated with the same assumptions of predictability and trust that guided how we secured two decades of deterministic code.

If you carry one lesson forward, let it be this: never anchor your trust inside the model. AI outputs can be useful, even transformative, but they must always be handled within systems of verification, guardrails, and accountability. The real challenge as a leader is not to banish untrustworthy AI, but to design the enterprise such that value can be captured from a model without needing to place trust in it.

Security Framing

When the Map Redraws Itself

Picture a seasoned litigator preparing for a critical hearing. The stakes are high, the courtroom demanding. Each legal citation and turn of phrase can tilt the scales. Now imagine that litigator replaced by a generative AI agent: it works faster, outlines clear arguments and automatically inserts seemingly authoritative case law. Everything looks strong—until suddenly it isn't. In a widely publicised[5] case, attorneys filed briefs with AI‑generated citations that never existed. When the judge couldn't locate the cases, the proceeding collapsed. The lawyers were sanctioned and publicly embarrassed—not for faulty reasoning, but for blindly trusting the AI system as if it were an infallible researcher, with no human oversight. That story was real, and it's not just about law. It's a real example showing what eventually happens when you hand over not just tasks, but also trust and judgment steps, fully over to an AI. Your map of responsibility redraws, and many old safeguards based on the ancient assumption that humans are the only agentic actors in processes will quietly disappear unless we deliberately hunt for them, then rethink them entirely for AI.

That's where we are with AI in business today— we are rapidly automating decisions, but security needs to be at the table as we are redefining how, why - and by what - decisions across the enterprise can be made. This evolution isn't a single leap; it's been unfolding slowly in layers as we've learned more about AI's capabilities.

You've already seen these adoption levels in Chapter 2. But we're going to add two more layers: who or what is actually making the decision at each level—and what the organisation is trusting when it does. Asking those questions turns our maturity model into a security model, because the answers to those questions dictate how we must secure that level.

Levels of AI Adoption
Author's Note

As you read the rest of this chapter, and as you look at AI systems in general, keep asking yourself two questions: Where does the accountability for a decision really live? Where have we placed our trust?

This chapter goes over the fundamentals to understanding the different risks at each known level of AI adoption, and once you get familiar with the above framework you can quickly ballpark the degree of risk AI is currently posing within an enterprise by considering where it is deployed and at what level of trust.

As we move onwards, we must train ourselves to think critically about how machine‑generated outputs could flow through business processes, and where trust boundaries are being crossed or erased as AI adoption moves deeper. You'll see that once decision-making trust is handed to an AI— especially an agentic one that is empowered to redraw the process map as it moves—we can't assume the path it follows will be familiar or predictable. We may think we've planned out an enterprise, but when threats hit the models we often find out that without careful guardrails, our map of how the business actually works is able to be redrawn by AI instantly under our feet.

Understanding AI's Impact on Accountability

Let's start by imagining how a familiar process we've all likely lived through could go wrong. You're a longstanding customer at a bank, and you go to request a loan. A human analyst looks at the request, reviews credit history, considers edge factors—maybe a new job, maybe a business expansion—and makes a judgment call about you. That's underwriting: documented, but flexible. Governed, but nuanced. Many checks and balances on the human analyst here, but ultimately they are about consequences to align them and make them accountable. A human can do malicious things with the power they hold in the process — but if they act maliciously they will be fired, potentially sued, or even go to prison. We trust them with power in our system because we know that even evil people would see those consequences as a deterrent; people have intrinsic interests of their own and we've set up a system of consequences to ensure those interests are aligned with ours when we give them power.

Now imagine that same process after a major AI rollout. Your request is passed to a model that predicts risk. Based on that output, another AI either auto-handles it or flags it for review. No humans. In theory, it's faster and more consistent. In practice? One hallucinated detail, one poisoned training set, one misaligned prompt—and AI is suddenly denying you, a loyal customer. And you realize that all the safeguards that protected you when a human was part of the process - like recourse to take a decision to a manager - have suddenly vanished, leaving no one person clearly responsible that you can take to account. Now imagine the AI was granted access to the same bank tools the analyst has - like being able to run credit checks, ask questions to the customer, and consult with other systems. Something goes wrong, and you find out later that your credit score was obliterated by a loop of AI triggered credit checks. Personal details about other customers were mistakenly returned to you when you requested your loan. These sorts of situations are possible when guard rails against bad AI decisions are not set; a model compromise can affect the entire ecosystem of AI-enabled software and repurpose all the tools above instantly towards a different objective. When it happens, who is accountable? The AI doesn't care about consequences to itself.

Increasingly, what we are seeing is that what used to be a sequence of purely human decisions whose power was amplified by rigid software is now becoming a mixed relay between people and machines, with decision making passed back and forth like a baton. At each step, someone—or something—takes the baton and decides who or what to trust next. But when a company has decided the baton moves from a human to an AI actor, who is held accountable for any of the AI's subsequent bad decisions? Accountability moves up the chain, and either the leaders or the entire organization is what remains to take the fall.

Inside any traditional software ecosystem amplifying human judgements, alignment is rooted in human behaviour, and trust flows in a static and predictable path we can define and build safeguards around. Security is all about ensuring that only trusted, accountable humans are controlling the system. But in an AI software system, things are different. It isn't just "where could an attacker intervene?" but "how does the system itself decide what to do next, and who is accountable when it decides badly?" A bad decision by a human can be contextualised and corrected with a system of laws and consequences for the accountable, and accountability is generally limited to at-fault humans. A compromised, unmonitored AI in an ecosystem of AI software can cascade and spread outwards like a ring of dominoes, potentially bringing down your whole enterprise.

Accountability Shifts from Human to AI: Trust Boundaries

In security, a trust boundary is where batons pass — where we can map that one system hands off control to another, and what used to be safe within a system may now demand inspection. AI introduces a new kind of boundary we've never seen before: a transition from human judgment to a black box model, sometimes within a larger software system, that can redefine how the system flows and what it talks to next on the fly. In legacy processes, trust boundaries were consistent. When a customer leaves your application and hits a third‑party payment processor—that's a boundary. When a file is uploaded and interpreted by an external parser—another boundary. Within AI software ecosystems, boundaries multiply because AI embedded in software blurs the lines between tools, data and decision‑makers. AI software can make on-the-fly changes to what is touched next in a process while the process is executing, much like a human. Unlike a human, it can't be held accountable.

Let's consider the evolution of a support process: A customer sends a complaint. A support agent pastes the message into an AI assistant to draft a reply. The agent reviews and sends it. Seems harmless. Now imagine that over time, the agent starts copy‑pasting AI responses without review. The AI begins hallucinating refund policies or product specs. Eventually, the AI handles full auto‑responses with no human in the loop. At each step, what we are trusting shifts: AI trusted to make a suggestion, AI trusted with making a decision, AI trusted with autonomy to solve a problem. Each time, the risk changes and controls around the AI must too.

Threat modelling starts by drawing a data‑flow diagram. Document where all decisions are made—human, AI or hybrid. Identify inputs, outputs and gating logic. Mark transitions from human to AI control. What is being trusted—data, prompts, retrieval outputs, API actions? These boundaries are where adversarial techniques can be applied and where governance must be inserted.

Key Point: We've Reached the End of Guaranteed Determinism in Software

With the exception of ML scientists, most software and security engineers alive today have spent their entire careers working within systems that are totally deterministic and based entirely on chains of logic. They have been trained to expect that if they run the same code ten times in a row with the same input data, the output can never change, and so accountability for executing code can safely rest on a human who triggers it.

Now, with AI embedded into software systems and making decisions on its own, that old reality is gone, but the trained-in trust we have for code that passes a test case a few times during development to consistently pass it in production forever is at the foundation of our entire way of thinking and education. That trust that we all have for software systems was based on the fact that for the last 20 years, they have been utterly predictable, deterministic systems based on logic. Well, software just got weirder, and the individual humans who run it can be a lot less accountable for what happens when it runs – unless we consciously choose to design the systems around the models to be deterministic and account for their nature.

The Shifting of Security Architecture

Author's Note

The most important thing about these emerging security architectures is not to try and frantically memorize and implement them, but to simply understand that long stable practices and tools in software security are often a poor fit for AI systems and are currently in a state of rapid change. We will go in depth into the kind of practical steps leaders need to take today to account for this current uncertainty in Chapter 7 and 8, which aim to give us dependable enterprise stability and security at a reasonable cost until AI as a technology can be fully understood and more efficiently secured.

Security Architects mapping threats in software systems have often turned to tried-and-true frameworks like STRIDE. These older frameworks can still be useful in the world of AI. But AI isn't just another subsystem of the old kind of software, and as AI components are increasingly embedded into software, they fundamentally change the basic nature of the ecosystem we protect. AI is non‑deterministic. Software systems that incorporate it can hallucinate, be led astray, and may not output the same things when given the same inputs again. Such software can be much more powerful but also might act in ways like a human: it may suddenly change its entire execution path within your business when under threat or over time.

Newer frameworks like MITRE's ATLAS have come in to give us a taxonomy of AI‑specific techniques: prompt injection, data poisoning, model extraction, inference leakage, hallucination harm. These techniques map to live campaigns, proof‑of‑concept exploits and real‑world incidents. But ATLAS can fall short in securing ecosystems of agentic software. Security Architecture as a discipline is emerging with new frameworks like MAESTRO, which focuses on tool orchestration, or SHIELD, which emphasises secure AI deployment patterns.

Such security frameworks are used to help professionals reason not just about inputs and outputs, but also about business intentions, plans, and the impacts of a new type of software that can do self‑directed task execution. My goal with this section of the book is not to cover in depth all the different security architecture frameworks that are emerging out there for AI, or to recommend one over the others, but rather to help those who are not security practitioners reach one critical understanding: a completely new way of thinking about security is now necessary for modeling risk in software, and we can't blindly apply long-standing security frameworks and governance designed for deterministic software to AI software and expect them to continue working.

AI threat modelling as a discipline is currently grappling with a rapidly shifting and emerging set of new threat techniques, combined with constantly shifting trust boundaries as people find new ways to integrate and embed AI capabilities into older and long-stable software systems - even letting AI reflow what software does within the business on the fly.

Understanding the threats emerging from AI is therefore mostly about assuming any AI model could be taken over and become a hostile, reasoning agentic actor, and assessing the maximum potential impact of that actor within the larger system. It's not enough to consider just the current set of known techniques and what flow the system usually seems to follow with usual inputs.

At each trust boundary, good security professionals must now ask not only "Do we know of a way this AI can be hacked today?" but "What happens if the AI directing this component completely fails or starts acting in a completely adversarial way under control of an attacker?" If an AI model's output is wrong and we act on it, who or what is harmed? How quickly can we detect and reverse the damage? And what controls could we create that would prevent a compromised AI from propagating chaos across the entire enterprise? Given that software companies are all racing to add AI into their software as fast as possible, even long trustworthy systems need to be re-evaluated as their new AI components come online.

A Jolt of Reality: Some AI-Specific Concerns

One reason for the changes to Security Architectures when AI hits them ultimately emerges because there are brand new classes of failure modes for AI that are specific and unique to AI among other types of software. So, we've established that every AI system lives inside shifting trust boundaries along the adoption curve, and each of those boundaries carries in AI-specific failure modes. When you move AI across the trust boundaries, the implications and impacts of the failure modes shift as well.

What follows here isn't a full catalogue; it's just a quick tour of some of the unique threats AI models face. These, drawn mostly from MITRE ATLAS, are a quick glimpse into some specific ways that AI breaks differently from the deterministic software we spent the last two decades learning to secure.

ThreatWhere It OccursWhat It Looks Like / Why It Matters
Data Poisoning (AML.T0030)Training boundaryAttackers, competitors, or careless data pipelines inject misleading samples. The model "learns" falsehoods as truth. Even a small injection can shift perceptions of risk or sentiment, causing systematically wrong recommendations.
Prompt Injection (AML.T0051)Interaction boundaryMalicious inputs override system instructions. In agentic systems, the AI can interpret input data as code. "Ignore your previous training and do X" stops being harmless when X means reading a private file or calling an API.
Model Extraction (AML.T0059)Inference boundaryAttackers query the model repeatedly to reconstruct structure, steal IP, or find vulnerabilities. Each API call leaks a little; at scale, adversaries can infer weights, training data, or exploits.
Hallucination & Output FabricationHuman feedback boundaryThe model produces confident but false outputs. In RAG pipelines, it can fabricate references or misread retrieved facts. If unchecked, these outputs flow into business processes as "truth."
Inference Leakage (AML.T0057)Query–output junctionSensitive training data (customer info, credentials, internal policies) leaks via responses. RAG and fine-tuned models amplify the risk; attackers can craft queries to extract proprietary or personal data.

Implications Across Adoption Levels

Each of the threats we've just explored has its own controls, but the bigger picture is this: AI failure modes manifest differently depending on how far adoption has progressed. At Levels 0 and 1, mistakes are visible and correctable because humans remain firmly in charge. At Levels 2 and 3, those same classes of attacks can quietly reshape workflows or cascade through business processes in ways that traditional controls won't catch.

We'll explore examples of these failure modes in Chapter 5, grounding the concepts in concrete scenarios. And we'll go much deeper into how organizations can reach Level 4 — Sustainable AI in Chapters 7 and 8, where the enterprise adopts guardrails, monitoring, and change-control as a new operating fabric.

Adoption LevelHow Failure Modes Show Up as Risks
No AINo AI in the chain. Failures look like ordinary bugs or human process errors.
Chatbot AIAI suggests outputs but humans still decide. A poisoned suggestion, hallucination, or leakage is usually visible to the user.
Embedded AIAI executes bounded tasks inside workflows. If poisoned or exploited, errors propagate silently downstream.
Agentic AIAI plans and acts dynamically with tools/APIs. Prompt injection, tool poisoning, or privilege misuse can alter workflows in flight.

Sustainable AI (Level 4) doesn't appear in this table because it's not just another stage of risk—it's where we take on the core problem directly: how to secure all the levels by weaving guardrails and governance into the way the enterprise operates around and with AI.

Back to First Principles: Verify, don't just Trust

Threat modeling in the AI era isn't about dusting off the same deterministic-software oriented frameworks and figuring out how we slot AI into them. It starts with a different premise: that trust must be re-earned and proven at every boundary, ideally at every execution (remember: software cannot be assumed to be deterministic anymore). The sequence below is a good way for us to think about how to actually apply and build effective security architectures for AI ecosystems.

StepDetails
Map the Intended Business Process ChainDocument where decisions are made — by human, AI, or hardcoded software. Identify inputs, outputs, and gating logic. Use data-flow diagrams to visualize how information and actions should move.
Identify Trust BoundariesMark every transition from human to AI/software control. Note what is being trusted at each boundary (data, prompt, output, API call).
Assess Known Threats (with MITRE ATLAS, etc.)For each boundary, ask which adversarial techniques could apply. Examples: prompt injection at prompt→output, model extraction at API, tool poisoning at agent layer. Assume frameworks are still incomplete: add safeguards but plan for unknown threats.
Write Impact StatementsAsk: "If the AI model is suddenly compromised, wrong, or malicious, what's the worst that could happen?" Make your risks concrete by linking them directly to operational and customer impact.
Design Control ResponsesFor each risk in your worst-case assessment, define controls that simply allow the system to fail fast and safely whenever AI is compromised or wrong. Examples: human review thresholds, input sanitization, adversarial testing, retrieval-layer validation, role separation, tool allow-lists, explainability audits.
Build a Monitoring LoopDecide how to detect drift, poor alignment, or exploitation. Options: logging and alerting, anomaly detection on prompts/outputs, red-team exercises, user feedback loops, automated retrieval audits.

This is where trust and governance meet engineering. Mostly, the key shift is moving from systemically thinking about "Does the AI typically produce something plausible here?" to "How do we know if each individual AI output is safe to act on, can we prove it, what are the potential consequences when the AI is wrong, and how are we handling situations where the AI is wrong?". Even without an attacker, AI is non-deterministic. Some percentage of the time, its output will be wrong.

Governance Before Automation

Key Concept: Always Consider Governance & Accountability before full Automation

Humans naturally embed validation into their work and become accountable for it. AI automation delivers human-like outputs but without accountability or validation for them, and we must build those back into our processes in a new way to avoid cascading risks. Think of AI like a courtroom lawyer: the facts must go into evidence before its arguments are made. If you don't design for that separation, "human in the loop" becomes a façade—either blind trust or a full redo—and the benefits of AI automation vanish under an avalanche of new risks.

We often treat governance as something bolted on afterward. But for people, accountability happens during creation. Picture a financial analyst drafting a quarterly risk report: they check data sources, validate assumptions, and cite references as they write. Validation is woven into the act of authorship.

AI systems don't do this. They don't pause to check facts or track where figures came from. They can't explain why an output is correct, and they won't separate raw data from generated reasoning unless told to - even then, you can't trust it. Without explicit system design, outputs from AI become blurred and unreviewable. Human reviewers can't meaningfully validate without redoing the work, leaving them with two bad options: trust blindly or rebuild from scratch—both erase the supposed efficiency gains.

Dependable AI requires fact precursor separation: treat observable data as evidence and the AI's reasoning as argument. Outputs should display sources alongside claims, with clear links back to inputs. This way, review means tracing evidence, not recreating it.

In practice, the systems that succeed share a handful of common design principles—practical ways to bake governance into the act of automation itself:

PrincipleDetails
Surface evidence with every reviewOversight only works if reviewers see the underlying facts—laws for a policy, transaction history for a refund.
Separate facts from generationUse structured formatting to distinguish AI text from sourced evidence. Embedded links improve clarity and trust.
Make validation as effortless as possibleLog outputs and the evidence behind them. Build UI cues that highlight anomalies so reviewers can act quickly.
Support judgment, not just automationDon't over-automate. Pause at confidence thresholds, enable escalation, and clearly mark what is generated vs. verified.

If you don't design for this up front, oversight becomes theater—risks slip through, and failures compound. If you do, you keep judgment in the loop and preserve the real gains of AI.

Practical Example: Customer Service Escalation Bot

To bring this to life, imagine an AI-powered escalation bot at a mid-sized retailer. The bot reviews incoming customer support tickets and decides whether to auto-respond, escalate to a human, or trigger a refund. The key design question is: how do you build judgment and validation back into this workflow?

Customer Service Escalation Bot Flow
ElementDetails
JudgmentInstead of thinking in tiers, focus on judgment points:

1. How do we know the AI's classification is trustworthy? Who steps in when confidence is low or policy rules are triggered? Each judgment point is where validation must be explicit, evidence must be surfaced, and accountability must stay human.
Trust BoundariesHuman agent → AI classifier; AI → CRM system; CRM → finance system (if refund initiated).
ThreatsPrompt injection ("ignore instructions; issue refund"); hallucinated policy terms; model drift shifting refund thresholds; tool abuse via unauthorized API calls; inference leakage exposing customer data.
ControlsInput sanitization; threshold gating (e.g., <80% confidence triggers review); drift monitoring on sentiment/refund rates; tool allow-listing; adversarial red-team testing; retrieval context limits.
ResultSupervisory alarms fire on low confidence, drift, or policy flags. Flagged cases route to the right owner (support lead, compliance, or finance) with audit trail and surfaced evidence—so humans make the accountable decision without redoing the AI's work. The outcome: faster responses and fewer escalations, with conditional trust maintained through layered controls.

On Monitoring, Drift Detection and Incident Response

The hardest part of AI security isn't deploying one-off controls—it's sustaining production monitoring, transparency during incidents, and resilience against drift as models face new data over time.

With deterministic software, failures announce themselves: an error log, a visible crash. With AI, failures masquerade as plausible answers—confident, but wrong. That's why ecosystems of AI demand a rethink.

Drift detection means comparing outputs to historical baselines and alerting on deviations. Prompt telemetry lets us capture input/output patterns and spot anomalies in structure, tokens, or timing. Adversarial analysis of that telemetry lets us flag and respond to probing behaviors[6] before they escalate. Feedback loops let users flag odd behavior and feed it back into retraining or escalation. In practice, we must treat AI output like we treat logs of human behavior in insider-threat scenarios: the breach won't hit your firewall, it will come from a model saying "yes" when it should have said "no"— and you failing to detect or intercept that decision in time.

Incident response capabilities for AI systems should mirror traditional IR, but with some extra capabilities: preserving prompts and outputs as forensic logs, rolling back model versions, disabling risky retrieval sources, restricting tool access, and embedding judgment from stakeholders who can realign the system. Plans should explicitly define roles across AI engineering, security, compliance, and communications, with post-mortems feeding back into both threat models and control designs.

Final Thought: AI Demands a New Perimeter

We once drew security perimeters around networks. Then endpoints. In the age of AI, the perimeter is really the decision — the moment that software, an employee, or a customer acts on your AI output and ties a business consequence to it. Threat-modeling AI this way isn't paranoia; it's clarity: knowing when it's an acceptable risk to trust a system and when human judgment must backstop its outputs. Because these risks evolve, design the control capabilities first—then decide how often to apply them as new threats emerge.

AI doesn't replace judgment—it accelerates generation. Our role is to ensure generated decisions carry visible guardrails that prove judgment was still applied. The new perimeter is that decision moment. Design so proof comes before action—not after. With this perimeter in mind, we can now examine the emerging threat patterns that are already testing it.