CHAPTER 5

Runtime Attacks on AI

The ground is still shifting beneath us. Each month, researchers and adversaries alike surface new ways to bend generative AI systems into doing what they were never meant to do. What looked like background noise a year ago—odd prompts, quirky errors—has now hardened into a catalog of emergent exploits with names, proof-of-concepts, and business consequences. The pace is unnerving not because any single technique is insurmountable, but because the terrain keeps redrawing. We do not yet have a complete map of how to secure AI systems, and we cannot afford to pretend that one exists.

That reality sets the tone and is the key takeaway for this chapter. If you are leading a business through AI adoption, my hope is that by the end of this chapter you will realize that you must operate and build systems under the assumption that today's threat-specific guardrails will not stop tomorrow's attacks, and you must assume that any AI integrated into your system could, in the future, be made to act against your interests. Threats like prompt injection and data poisoning are not the outer edge of possibility; they are the opening chapter of a long threat playbook still being written. Our task is not to catalogue every tactic—any list I write will be outdated by the time you read this—but to understand the categories of failure they represent, the ways they intersect with business processes, and how to model them so your systems can adapt to new threats under pressure.

Setting the Stakes

Imagine you are the operations chief at a global logistics firm in late 2025. You wake up to frantic messages from your facilities team: dozens of shipments have been rerouted overnight, work‑orders were executed without authorization, and your premium cloud bill has spiked ten‑fold. Nothing in your traditional threat dashboards looks abnormal. Firewalls report no breach, and endpoint sensors see no malware. Then the CISO explains that the scheduling assistant – a generative AI agent integrated with calendar feeds and your warehouse management system sent a burst of update commands at 2 AM. How? A malicious actor embedded instructions into a shared calendar invitation description. When the AI summarized the week's events for your operations lead, it had access to the system and dutifully executed those hidden commands.

That scenario is not entirely speculative. In August 2025 researchers at Tel Aviv University and SafeBreach demonstrated that Google's Gemini assistant could be hijacked with a poisoned calendar invite to control smart‑home devices; their proof‑of‑concept was one of the first LLM attacks to cause physical effects. Zenity's revelation of the AgentFlayer attack showed that hidden prompts inside innocuous documents can cause ChatGPT connectors to search connected drives and exfiltrate data via image rendering. These incidents highlight an uncomfortable truth: in systems where models interpret unstructured information, the interface between "content" and "commands" collapses. Inputs that look benign to humans can carry executable intent to the machine. Our job as leaders is not just to react to each new threat as it comes out, but rather to anticipate how these blended boundaries might be abused in the future and design robust safeguards accordingly.

A New Lens on Security

As we discussed, traditional threat modeling focused on structured inputs, discrete vulnerabilities and well‑defined immutable trust boundaries. And AI upends each of those assumptions. Agentic systems have models that consume free‑form text, code, images, and audio. They operate through tool orchestration, chaining multiple plugins or APIs together to accomplish arbitrary tasks. They rely on models and datasets acquired from external sources to create new flows through our software ecosystem according to a goal. And they execute over high‑cost compute infrastructure. Each of these capabilities during integration has been proven to introduce its own class of threats in recent history:

ThreatDescription
Injection and hijackUntrusted content triggers unintended actions, such as poisoned documents or malicious calendar invites.
Privilege escalationAI's integration with plugins is abused to leapfrog into higher-privilege systems.
Adversarial evasionInputs are deliberately crafted to circumvent detection or classification.
Supply-chain compromiseModels themselves are compromised—containing backdoors or leaking training data.
Resource exhaustionAttackers exploit the economics of LLMs to degrade performance or inflate operating costs.

In the rest of this chapter we'll look at a handful of AI-enabled attacks—not as an exhaustive catalogue, but as living examples of how fast the security landscape is shifting. By the time you read this, some of these specifics may already feel outdated. What will endure are the lessons they point to: AI decisions must be treated as the new perimeter, and only with the right architectural foundation—observability, clear trust boundaries, and control points—can you both apply known safeguards and move quickly to catch and respond to the next zero-day. The point is not to memorize every exploit, but to see some practical examples of how AI failure modes bent or broke our trust models around software. And understand why a new resilient oversight architecture for AI is the only way to keep pace with a risk surface that is expanding faster than any list can capture.

Poisoned Documents – Trusted Content is now Code

The most dangerous AI exploits don't smash the door; they stroll in carrying a clipboard, looking like they belong. The poisoned document attack is a case in point. On the surface, it's just a PDF or a slide deck — the sort of data content your teams swap back and forth a hundred times a day. But now, hidden inside the text, are invisible instructions crafted not for the human eye but for any model reading it. When any employee asks any AI to "summarize this deck," the system doesn't just summarize. It executes those buried instructions, as if the attacker had whispered commands straight into your workflow.

Zenity's AgentFlayer proof-of-concept made this risk tangible: a prompt hidden inside an innocuous file caused ChatGPT connectors to dig through a victim's Google Drive for API keys and quietly exfiltrate them via an image link. What's striking isn't the technical trick — it's how ordinary and trusted the delivery path was. Most businesses treat downloaded files as "safe" once they've cleared malware scans. But AI treats every word as a potential instruction, so content isn't just content anymore. At the decision perimeter, it's code.

Attack Details

The adversary embeds invisible instructions inside a seemingly mundane file (e.g., a PDF or slide deck). When a user asks an AI assistant to summarize the document, those hidden prompts override the intended task. In Zenity's proof‑of‑concept, the prompt instructed ChatGPT to search the victim's Google Drive for API keys and embed them in an image link, exfiltrating the secrets via a simple HTTP request.

Why It Mattered

For two decades, enterprise security relied on a simple truth: data and code were always handled separately in software. Documents could be opened, emails read, and databases queried — and as long as you sandboxed execution and filtered inputs, the barrier held. That assumption underpinned every intrusion detection system and malware filter in use today.

Generative AI breaks that foundation. In a large language model, every token of text — whether a sentence in a PDF, a formula in a spreadsheet, or a clause in a contract — can be treated as either data or new instruction. There is no reliable way to tell the model "this is content" and "this is command." The separation we spent twenty years defending simply doesn't exist inside Generative AI and is no longer a strong barrier.

This is not a bug to be patched. It's a structural shift in the threat landscape for all software. With AI, every file and feed your business consumes as data must be treated as an active surface of attack in that it can be followed as instructions by AI. At the decision perimeter — the moment a human acts on an AI output — poisoned content can quietly steer outcomes in ways no firewall can stop.

How Trust Breaks

This is where the collapse becomes visible in practice. The existence of poisoned document attacks isn't just about bad data; it means all your trusted content repositories are now an easy execution channel.

Example BreakdownWhat Happens in PracticeWhy It Matters for Leaders
Hidden instructions in a PDF override an AI's expected taskThe AI will sometimes execute commands embedded in what looks like inert contentYou can no longer assume documents are passive data. Every file your teams touch is now a potential control channel into your business AI systems.
"Summarize this contract" request triggers hidden promptsInstead of providing a neutral summary, the AI inserts or omits clauses, distorting obligations and riskExecutives may sign off on deals, budgets, or compliance positions based on manipulated evidence — decisions that carry legal and financial exposure.
Proprietary content exfiltrated through "summarization"API keys, customer records, or internal notes leak during routine document handlingBreaches no longer require someone to bypass your network perimeter — they occur at the decision perimeter, where employees trust AI outputs without realizing they've been manipulated.
Author's Note: The Illusion of AI Decision Gates

The poisoned document attack is more than just a trick with hidden prompts — it exposes a systemic weakness in AI we can't avoid. For decades, engineers trusted decision gates in software as reliable guardrails: hard-coded rules that behaved predictably, safeguarding the execution path. That foundation in deterministic software created confidence that if we code in filters, classifiers, and approvals they could not be bypassed by data being processed, and that they meaningfully improved the security of our software.

I often see engineers respond to learning about prompt injection by trying to stack more AI calls on top of the problem — trying to use one model as a decision gate to police another. It's an understandable instinct, because in traditional software, gates really did strengthen security.

But with AI, that best practice usually breaks. Deterministic checks and human validation gates still work to police models, but an "AI gate" is just another model call — as brittle, steerable, and exploitable by this type of attack as the AI it is securing. Adversarial evasion prompts can be layered and make the gates silently pass the inputs they were designed to block. The fundamental problem is still that in AI calls, content can be interpreted as code, and the same brittleness that lets an attacker trick a contract summarizer into dropping clauses also lets a filter or approval gate nod along while malicious inputs slip through. Guardrails that appear solid in testing can still be bent in production by attacker inputs crafted to exploit blind spots.

For executives, I have one piece of direct advice: never accept more AI calls as sufficient controls for AI software making risky decisions. Treat them as advisory at best. Where risk is material, insist on gates based on human oversight and multi-signal deterministic checks. The belief that AI can "self-police" like traditional software through its own gates is not just a misconception — it's a dangerous error that adversaries are already exploiting. Chapter 8 discusses more reliable controls that can be used instead or in addition.

Impact by AI Adoption Level

Poisoned documents don't pose the same risk in every context. Their impact scales with how deeply AI is embedded in your workflows. A chatbot that only produces text summaries is inconvenient when tricked; an embedded AI that feeds into business processes is dangerous; and an agentic AI that can act on its own is critical. This table illustrates how the decision perimeter grows more fragile as adoption levels increase.

Adoption LevelHow the Attack Plays OutTrust Fragility
Chatbot AILow exposure. AI can mis-summarize a file but lacks authority to act. The worst case is a user trusting a distorted summary.Trust rests with the human in the loop; the perimeter is intact, but vulnerable to overconfidence.
Embedded AIHigh exposure. Poisoned documents distort automated workflows (e.g., compliance reports, financial dashboards) without human review.Trust silently migrates from humans to systems; the perimeter shifts deeper into the business process.
Agentic AICritical exposure. A poisoned input can trigger autonomous actions (API calls, financial transactions, database changes).Trust collapses at scale — decisions and actions happen faster than oversight can intervene.

Takeaways

Executive ImperativeWhy It Matters
Isolate untrusted content from decision systemsOnce data and code collapse, any file could carry commands. Keep AIs that process external documents firewalled from systems that trigger financial or operational actions.
Require visible evidence behind every summaryExecutives and staff must see what input drove an AI's output, so poisoned prompts can't silently shape decisions.
Tier reviews by business riskHigh-stakes decisions (contracts, finance, compliance) triggered by documents must route through human approval, not default to automation.
Instrument for traceabilityEvery AI-document interaction should be logged so you can reconstruct how a decision was made when trust is questioned.

Poisoned documents are a visible reminder that every file we once treated as inert can now carry hidden instructions. But documents aren't the only Trojan horse. Business runs on countless background feeds — calendar invites, notifications, system alerts — all assumed to be passive signals. In an AI-driven workflow, those too can become control channels. Before we turn to those ambient threats, we need to look at the broader family of attacks that make them possible: adversarial inputs, carefully crafted to steer AI systems off course.

Calendar Invite Hijack – Turning Ambient Data Streams into Command Channels

Imagine your logistics AI waking up to find a new "urgent shipment" event on the company calendar. No one questions it — after all, calendar invites come from inside the business and are treated as trusted signals. Within minutes, trucks are rerouted, resources reallocated, and deadlines shifted, all in response to a poisoned calendar entry.

We rarely think of calendars as attack surfaces. They feel like background noise, passive reminders that simply anchor the workday. But in an AI-driven workflow, a calendar invite isn't just a reminder. It's a trigger — and once trusted by automation, it becomes a lever attackers can pull to steer entire systems.

Attack Details

Many AI agents embedded in productivity tools maintain continuous awareness of calendars, message queues, and dashboards to proactively assist their users (often accessed via RAG). At DefCon 2025[7], Google's Gemini was tricked into executing a payload hidden inside a calendar event description—summarizing "upcoming events" triggered it to turn smart-home lights on and off. At enterprise scale, the same pattern could reroute shipments or halt production lines.

Why It Mattered

Modern software is built on the assumption that system-to-system data can be processed safely. For decades, that assumption held because there was a clear separation between data and code — a calendar invite was just metadata passed between systems, inert and trustworthy. Nobody thought twice about whether an invite needed review; the very architecture of enterprise software was designed around treating calendar data as passive context.

With AI in the loop, that assumption collapses. Unlike poisoned documents, which at least pass through human eyes before being acted upon, calendar invites go straight into automation. No one reviews them, no one takes accountability, and the AI treats them as ground truth. A poisoned invite doesn't just create confusion — it can redirect operations, shift compliance deadlines, or fabricate executive records without a single person ever noticing. And at the decision perimeter, where actions are triggered based on AI outputs, that unreviewed signal becomes a direct path to compromise.

Example BreakdownWhat Happens in PracticeWhy It Matters for Leaders
Fake logistics event injected into calendarAI reschedules or reroutes shipments based on false dataMillions in losses from delays or misallocated inventory, all triggered by a single poisoned feed
Regulatory filing deadline alteredCompliance AI marks the wrong date as authoritativeOrganization misses statutory deadlines, exposing itself to fines and reputational damage
Executive calendar manipulatedFalse records of meetings or approvals createdLegal or governance exposure when audit trails are corrupted and leadership actions are misrepresented
Author's Note: Why Prompt Injection is Not Just AI Phishing

I often see executives jump to the conclusion that prompt injection is "just phishing for AI." It's an understandable comparison — after all, both involve crafted messages designed to trick the recipient. But the analogy breaks down in two critical ways.

First, phishing and social engineering have natural limits. A human can only be tricked so many times, and only one victim at a time. Prompt injection in an AI ecosystem has no such ceiling. Once an input is poisoned, it can propagate instantly across every AI system it touches, scaling compromise across departments or even the entire company at once.

Second, humans — even when tricked — have built-in limits. It's nearly impossible to design a phish convincing enough to make an employee wire out the company's entire treasury. AIs don't have those brakes. Once flipped, an AI will execute whatever goal or command it's been given, tirelessly and without hesitation.

That's why treating prompt injection as "just AI phishing" understates the risk. It isn't persuasion; it's systemic compromise at machine speed.

Impact By AI Adoption Level

Adoption LevelHow the Attack Plays OutTrust Fragility
Chatbot AILow exposure: an assistant may misread or mis-summarize an invite, but a human still decides whether to act.Fragility is minimal; risk depends on whether employees over-trust the AI's interpretation of the invite.
Embedded AIHigh exposure: poisoned calendar data flows directly into operational workflows (logistics, compliance deadlines, staff scheduling).Trust migrates into systems; poisoned invites silently distort business processes without human review.
Agentic AICritical exposure: an agent treats a calendar event as an authoritative trigger — rescheduling fleets, reallocating resources, or missing filings autonomously.Trust collapses; systemic actions unfold at machine speed before oversight can intervene.

Takeaways

Executive ImperativeWhy It Matters
Treat ambient feeds AI sees as active codeCalendars, notifications, and alerts processed by AI can never be assumed to be passive/inert — once processed by AI, they can suddenly drive business actions
Segregate workflow triggersPrevent AI from directly initiating high-stakes actions (shipments, filings, approvals) without secondary confirmation
Mandate cross-checks for critical eventsRegulatory deadlines, executive meetings, and fleet schedules must be verified against independent data sources
Instrument for provenance and traceabilityEvery AI action tied to calendar data (or other passive data input) should log its input and decision path to detect and reconstruct manipulation

A poisoned calendar invite showed us how a single, unreviewed signal can hijack a business process. The next level of danger is not a single poisoned input — it's chaining those inputs across integrations. Modern enterprises stitch AI into dozens of plugins and connectors; each integration is a new seam an attacker can pull on. When adversaries combine a poisoned feed with weak plugin controls, they don't just misdirect one process — they orchestrate lateral movement, automated escalation, and multi-system compromise at machine speed.

Cross-plugin exploitation is the story of composition: small failures joined together become systemic. The following example shows how attackers turn benign integrations into a campaign, and why protecting the decision perimeter means hardening both inputs and the decision glue between systems.

Cross‑Plugin Exploitation – Integration Privilege Escalation

Important caveat: to date we haven't seen a major cross-plugin exploitation play out at enterprise scale in the wild. Most of the high-impact scenarios are demonstrations and lab proofs — they work when researchers or red teams stitch components together under controlled conditions. I'm adding a discussion of this attack type here because it gives us a clear picture of what's possible, not because we have impactful proof it's happened at scale.

Why call this out up front? Because cross-plugin attacks are an emergent property of AI software ecosystems. Individual stacks and vendors already offer the building blocks — connectors, agent frameworks, and tool-enabled assistants — but many organizations haven't yet deployed those capabilities pervasively at the levels of automation and privilege that make such chained attacks devastating. As adoption broadens, however, the attack surface changes from isolated features to composition: and the seams between integrations become an easy place attackers can pull at.

The good news is also the point: today, this is largely a design problem we still control. The lab PoCs are warnings and rehearsals. If we design our integrations, manifests, token scopes, and approval/decision gates with an eye to such attacks now, we can prevent many of the sequences researchers have demonstrated from becoming real-world compromises. Treat the proofs the security practitioners are publishing around this as a roadmap for defenses, not as an inevitability.

Attack Details

As AI assistants become orchestration layers, they often connect to multiple plugins or tools at once—CRM systems, payment processors, scheduling applications. A vulnerability in one plugin can cascade into the others if the AI is tricked into chaining actions across them. The adversary doesn't need to breach your crown-jewel system directly; they only need to compromise the weakest plugin and let the AI bridge the gap. At BlackHat 2025[8], the same security researchers from the last attack also showed how manipulating a plugin could also be leveraged to access a linked calendar tool and, through it, confidential meeting details.

Why It Matters

Enterprises manage risk by scoping integrations and applying least-privilege: each connector is allowed a narrow role, with the assumption that composition is safe because humans mediate sensitive actions. That model breaks when agents are allowed to compose capabilities. An agent that can call multiple plugins at runtime can accidentally or maliciously escalate privilege by chaining otherwise benign operations into a path that ends at a sensitive system.

The critical failure here is the bypass of human checkpoints and ownership clarity. Single plugin calls often look legitimate; what is hard to see — and therefore hard to defend — is the sequence those calls form when stitched by an agent. Cross-plugin exploitation moves at machine speed, can traverse well-segmented systems, and often leaves a trail of ordinary-looking logs. For leaders, the risk is a small, trusted connector turning overnight into a pivot for high-impact compromise.

How Trust Breaks

Example BreakdownWhat Happens in PracticeWhy It Matters for Leaders
Low-privilege plugin returns crafted payload that agent parsesThe agent chains calls (e.g., create calendar event → call webhook → request resource) and crosses trust boundariesA minor connector becomes the pivot to sensitive operations — permissions that once limited risk are now a bridge to high impact
Structured text in CRM or notes triggers unintended escalationThe agent misinterprets a field as an instruction and creates upstream actions (refunds, provisioning) across systemsAutomated workflows can be manipulated to cause financial loss or service disruption before humans notice
Monitoring/diagnostic plugin exposes pointers that agent followsThe agent uses a pointer to fetch and execute code via a CI/CD pluginAttackers can escalate to code execution or configuration changes without directly breaching source control or CI credentials

Impact By AI Adoption Level

Adoption LevelHow the Attack Plays OutTrust Fragility
Chatbot AILow. Without tool/plugin access the assistant's missteps are limited to misleading outputs and suggested actions that humans can reject.Human gatekeepers still hold most trust; composition requires tool access to matter.
Embedded AIHigh. When models feed into workflows with plugin hooks, composed inputs can corrupt processes, create false records, or trigger provisioning.Trust migrates into the orchestration layer; small manipulations can ripple across systems.
Agentic AICritical. Agents with tool access can be made to chain cross-plugin calls that escalate privilege, alter infrastructure, or exfiltrate secrets autonomously.Trust collapses at scale — actions happen faster than oversight, amplifying damage across domains.

Takeaways

Executive ImperativeWhy It Matters
Define the decision perimeter across integrationsThe real risk isn't in a single plugin — it's in how AIs stitch them together. Leaders must set the boundaries of which actions can flow system-to-system without human review, and which must be gated.
Establish composition-aware approval gatesIn traditional software, a "pass" at one gate was enough. With AI, escalation happens through sequences of legitimate actions taken by a compromised model. Require extra review when AI authored workflows cross from low-trust to high-trust systems or have unusual composition patterns at inference time.
Demand cross-plugin observabilityLogs of individual plugin calls aren't enough — monitoring must reconstruct sequences to reveal hidden chains. Without this, escalation looks like a series of legitimate single steps.
Treat plugins as governance objects, not just technical connectorsEach integration should have a manifest, risk tier, and owner. This transforms plugins from hidden glue into accountable components of the control plane.
Use least-privilege as a design input, not the whole solutionLimiting permissions still matters, but it doesn't stop escalation-by-composition. Least-privilege must be paired with controls that recognize and break unsafe chains.

In Closing

The attacks we've walked through in this chapter illustrate how fragile the runtime boundary has become. Plugins, integrations, and chained prompts turn the model into a hub of hidden risk, and our decision perimeter now lives in the moment where AI outputs are acted upon. These are real and pressing issues — but they all assume the model itself can be trusted.

That assumption is exactly what we need to challenge next. Runtime defenses only matter if the foundation beneath them is solid. And in the AI era, the foundation isn't code you can read — it's data you can't see. That shift forces us to rethink what "trust" even means in software, and to accept that securing AI requires vigilance not just in operation, but in the model's origins and economics.

Author's Note: AI Denial of Wallet Attacks

One additional runtime risk worth noting is what I call denial of wallet. Instead of crashing a system outright, the attacker drives up your costs — forcing a model to consume expensive compute cycles through carefully crafted prompts or coordinated floods of high-cost requests. For a business, the effect isn't a service outage, but a financial one. The system keeps running, but it quietly bleeds budget until the economics of operating it become unsustainable.

At present, this is less urgent than the core runtime threats explored in this chapter. But as AI is woven deeper into workflows and customer-facing services, denial of wallet shifts from nuisance to potential sabotage — an attack on the viability of the business model itself. It's a reminder that in the AI era, resilience isn't only about uptime; it's about the ability to sustain operations under financial pressure as well as technical stress.