CHAPTER 2

Making Sense of the Language of AI

The Illusion of Understanding

The meeting should have been routine. We were reviewing an AI pilot that auto-categorized support tickets to route them to the right teams. As I walked through the architecture, an executive nodded confidently at "zero-shot learning" and furrowed at "embedding vectors". The pattern here was familiar. On the surface, everyone was fluent in the language of AI. But underneath, there was a deep disconnect between terms and understanding.

That disconnect is dangerous. AI has crashed into our business lexicon with startling force. It's spoken of like a utility—always on, abstracted away, and presumed reliable. But it's not like other features of software. It's an entire spectrum of different capabilities. When leaders can't distinguish between a simple rule-based AI and a generative agentic system that writes marketing copy, they frequently adopt tools without appreciating the differences in power, risk, and governance.

The consequence of poor understanding of the terms is an illusion of mastery. A false sense that because we can repeat the buzzwords, we've internalised the concepts. In practice, that illusion leads to over-trust in vendor solutions, under-investment in oversight, and strategic decisions built on shaky foundations. This chapter exists to puncture that illusion and get us to a common frame of reference we can use throughout the book. We'll travel quickly from the familiar world of deterministic software to the new frontier of agentic software. We'll equip you with the vocabulary to ask good questions about how your AI investments work and how to spot when risk profiles change. I don't want to focus on a low-level technical understanding here - but rather keep the conversation to concepts and an understanding of impacts and risks associated with each of the common terms we hear in AI conversations. That way, when we reach threat modelling in Chapter 4 and a discussion of trust frameworks in Chapter 7, you'll have a solid conceptual backbone to hold the details.

From Rules to Learning: A Quick Primer

Before we can appreciate how revolutionary today's generative models are, it's useful to recall what came before. For decades, enterprise automation was dominated by deterministic rule engines. These systems followed explicit "if/then" logic flows that describe processes: if the invoice date is past due, mark it for escalation; if the email comes from a blocked domain, flag it as spam. Rules in this type of AI are transparent and auditable but brittle. They break when the world changes or when the edge cases multiply beyond human capacity to anticipate. Software was built around scaled investments to formalize this kind of logic, until machine learning (ML) and later its generative subdomain (GenAI) took us down a different approach, replacing deterministic logic systems with models that infer patterns and make probabilistic predictions.

ML was the next big step. Instead of writing every rule by hand, instead we feed algorithms lots of examples. The ML system learns patterns—correlations between inputs and desired outputs—and then generalises an ability to predict within that context. Fraud detection, recommendation engines, predictive maintenance: all rely on this pattern-recognition. The models "learn" weights to minimise error, but they don't understand motivation or context the way a person does. They emulate reasoning through pattern recognition, capturing statistical relationships rather than intuitive leaps. They adapt better than rule engines but remain narrow. A good analogy for this type of system is learning how to throw a baseball - you're not solving the problem by computing the physics of every throw as parabolic arcs, you're learning an ability to predict based on practice and exposing yourself to thousands of examples. Just like if you changed from baseball to football and tracked how well you throw, in ML when you change the data domain or distribution, performance degrades.

Deep learning pushed this further. Neural networks with many layers could recognize complex patterns in images, audio, and text, and make more capable predictions than earlier systems. Even so, before Generative AI, both ML and deep learning were largely limited to analyzing existing data and making task-specific predictions. Their strength was in classification and detection, not in producing new material or ideas.

Generative AI: The Big Leap from Prediction to Creation

The rest of this book focuses on the implications of generative AI, not what came before. The big leap we experienced with ChatGPT was when we started being able to use models not simply to classify or predict but to generate entirely new content - that was actually reasonable and plausible instead of incoherent. Generative AI models can write code from natural-language instructions. They can draft a personalized email campaign from a handful of keywords. The reason this is fundamentally different is because, for the first time, software is now able to create something that looks completely original and in many business cases, highly useful. This type of task until recently was something that only humans could do, and we built the entire ecosystem of software and governance of modern enterprises around the fact that software could only go so far and no further. Every single UI and governance policy was created around the need for introducing humans into your processes to solve the tasks that software couldn't, and the cost structures that were dictated by those limits.

What makes this leap so intoxicating for business leaders is the ease of access. Anyone can now drop any question imaginable into a state-of-the-art model and get back a plausible, confident answer within seconds. Anyone can upload a corporate knowledge base to a model and create a chatbot that can largely sound like a trained employee. That democratization and ease of creation blasts the traditional investment moats around ML - and all of software - apart.

You no longer even need a software engineering team to build a proof of concept, you can just ask an AI agent to generate one and a few minutes later, you can be playing with working code. The immediacy of value is both empowering and perilous. When AI is accessible to all, your governance must be completely rethought. And when the models are powerful enough to generate entire codebases or hold your entire corpus of regulatory documents in memory, the scope of AI's influence on your organization—and the blast radius of its mistakes—expands accordingly.

Vocabulary That Matters: Tokens, Models, Agents

To navigate this landscape, you really need a few core terms. Tokens are a measurement unit describing the chunks of text used during training and inference and scope the amount of context you can work with in a model. Large Language Models (LLMs) don't operate on ideas; they operate on tokens and the statistical relationships between them. When discussing tokens, you are discussing how much context the model is working with in a single prompt, or how much it will output. It's also a measure of cost - most hosted model services bill you in tokens per dollar. A term you'll hear often alongside tokens is context window—the number of tokens an LLM can "see" at once. Early models were limited to a few thousand tokens. GPT-5 can process up to a million tokens at once, allowing for book-length analysis, but in practice result quality tends to depend more on how well the data is retrieved and segmented.

Parameters are the learned weights that determine how input tokens map to output tokens. Frontier models at the leading edge of AI development now have trillions of parameters, which is why they require immense compute to train. Though occasional exceptions and workarounds exist^[2], the more parameters a model has, the higher its cost per token and the slower its generation speed.

Now enter the concept of an agent. In the AI sense, an agent is a system that combines a model with traditional software to plan and execute multi-step tasks autonomously, much like a human actor. Unlike a vanilla LLM that returns one output and stops, an agent can decide, "I need to call an API to get up-to-date stock prices before answering this question," or "I will use a summarisation tool to condense this report before drafting an email." Agents always blend language models with one or more other technologies— such as retrieval systems, databases, task planners—to emulate reasoning and take actions. This orchestration is what gives frontier AI its human-like feel.

RAG: Bridging Memory and Knowledge

One of the most important evolutions in integrating generative AI was the concept of retrieval-augmented generation (RAG). RAG essentially combines an LLM's language fluency with a search engine's ability to retrieve relevant information from a body of knowledge to inject as context on the fly. LLMs are trained on vast datasets but have no access to your proprietary documents, up-to-date analytics, or a regulated knowledge base. Fine-tuning a model on those documents is expensive, time-consuming, and static. Once fine-tuned, the model's "knowledge" is frozen until you train again.

RAG takes a different approach. Whenever you prompt the model, the system first retrieves relevant passages from a database or search index. Those passages are fed into the context window of the LLM along with your question. The model then generates an answer grounded in this retrieved information. The retrieval step can be powered by a simple keyword search or by semantic similarity captured through vector embeddings. The result is a system that can answer questions based on your specific data without altering the underlying model.

The distinction between training and retrieval matters a great deal for risk and governance. Training changes the weights of your model, altering its base behaviour, and actually embeds the information in your training as part of the model permanently. Cyberattacks exist that can extract training data back out of models, so training can make your model riskier to expose in production. Once you've trained a model on some data, exposing your model to someone who can freely prompt it, without taking safeguard steps^[3], can be equivalent to exposing the training data to them as well.

Retrieval simply changes what information the model sees at inference time. If a RAG system produces a bad answer that's not embedded in the model you can usually fix it by curating the documents or improving the search. But if a fine-tuned model produces a bad answer, you may have to retrain it from scratch. In Chapter 4, when we examine attack surfaces, you'll see that both approaches introduce unique vulnerabilities^[4] that are different from those in model training. Understanding the difference now prepares you to mitigate and govern against those threats later.

Agentic Systems and Tool Use: From Single Shot to Orchestration

The real frontier isn't just generative models; it's agentic systems—AI that can use other tools, connect its reasoning over multiple steps, and remember context from earlier interactions. Imagine a chatbot that doesn't just answer your question but notices an error in your CRM, updates the record, and emails the customer to apologise. Or a risk-assessment agent that reads incoming regulatory changes, checks your policies, identifies gaps, and drafts remediation plans.

Agents are made possible by three ingredients: powerful language models, tool interfaces, and feedback loops. The model interprets user intent. The agent decides which tools to call—a database query, a CRM update, a scheduling API, etc. It evaluates the results and decides what to do next. This is where tool use comes in. An LLM alone is like a brilliant intern with no access to your systems. Give it tools, and it becomes a collaborator that can take real action.

This shift has profound implications for both opportunity and risk. Opportunities include automating complex workflows, stitching together disjointed systems, and creating personalised, dynamic experiences for customers. Risks include unbounded execution, escalation of privilege, and new attack surfaces. An agent that can call an external application programming interface (API) can also be tricked into calling the wrong one. A system that reads data from multiple sources can be fed poisoned content through any of those sources. Researchers in 2025 are also experimenting with reinforcement learning, where systems refine their behaviour through feedback or use one AI to train another. This kind of recursive learning can accelerate progress, but it also adds new layers of uncertainty and increases the need for oversight and control.

Hype, Reality, and Workforce Impact

Given the pace of progress, it's no surprise that generative AI and agents attract both hype and backlash. We've already been clear on two points: LLMs simulate understanding rather than possess it, and they mirror back an inference drawn from their training data on incoming inputs rather than deliver objective truth. New misconceptions are emerging anyway. Product demos and investor decks sometimes frame agents as instant department replacements. Reliability engineers, operations leaders, and safety researchers, on the other hand, describe agents as brittle systems that fail outside carefully staged tasks. The truth lies somewhere between the two.

Here is that middle ground. Agents are powerful because they connect language generation to real actions. They coordinate tools, call external APIs, and reason over changing data. That same coordination introduces variance: even with identical inputs, small differences in intermediate steps can produce different paths to the same goal—or different outcomes altogether. Agents excel in bounded workflows where inputs are well-defined, actions are observable, and rollback is cheap. They are fragile in open-ended environments where goals, data quality, and permissions are loose. RAG systems can sound authoritative, but their accuracy tracks the freshness and neutrality of what you retrieve; stale or biased sources produce stale or biased answers. The practical playbook is straightforward: constrain scope, instrument everything, separate read from write, and give humans explicit decision rights at the points of highest consequence.

The labour market is already adjusting to these dynamics. Leaders are consolidating functions where agentic workflows can triage, draft, and hand off reliably, while moving human effort up the chain to oversight, escalation, and policy enforcement. Roles closest to repeatable content and routine coordination are changing fastest; roles anchored in judgment, exception handling, and accountability are becoming the control plane. Entire departments are not vanishing, but organizational charts are being redrawn around where agents are reliable, where they are merely helpful, and where they should not operate at all.

Punchline: Agents don't end the need for human work; they redraw its boundaries. The organizations that thrive will be the ones that understand where those boundaries belong.

Essential AI Vocabulary

Now that we've covered the big concepts—rules, learning, generation, agents, RAG—it's worth codifying a vocabulary that leaders can use to interrogate projects and vendors. Here's a quick reference of terms you'll encounter when dealing with AI, and how you might use them. The questions here are all ones we might ask if facing the Unreviewed Report example we opened this chapter with:

Term	Meaning	Questions to Ask
Base model	The pre-trained AI (e.g., GPT-5) before any domain-specific customization.	What base model produced the report, and what are its training data sources and known failure modes?
Fine-tuning	Adjusting the base model's weights using your own data.	Was this model fine-tuned on our research notes or client materials? Who curated that dataset, and how often is it retrained?
Embedding	A numerical representation of data that captures meaning for similarity and retrieval, based on the model that created it.	Are we generating our own embeddings or relying on a vendor? Where are they stored, and how are they access-controlled and rotated?
Context window	The amount of input the model can consider at once (tokens).	Did the draft exceed the context window (causing truncation)? How do we handle overflow and ensure key sources aren't dropped?
Agent	A system that can plan and act by invoking tools/APIs.	Did an agent fetch data, format the report, or email it onward? What permissions and allow-lists constrain those actions?
RAG	Retrieval-Augmented Generation: pulling documents at run-time to ground outputs.	Which sources were retrieved (internal DBs, analyst notes, external feeds)? How do we prevent poisoning or prompt-injection in that layer?
Parameter count	The scale of the model's learned weights (often tied to cost/latency).	Do we need a frontier-scale model here, or would a smaller one meet quality needs with better cost and latency?

Equipping yourself and your teams with these terms fosters precision. Precision, in turn, fosters accountability. When everyone in the room can distinguish between a model that's fine-tuned and one that uses RAG, or between a 200-million-parameter mini-model and a trillion-parameter frontier model, you can at least have a meaningful conversation about cost, performance, and risk.

Levels of AI Adoption

We've named the components of the new GenAI technology (tokens, RAG, agents). Now we need a clear map of how far the work of AI has moved within the business from "tooling" to "decision-making." Levels of AI adoption is a critical concept that we will revisit in much greater detail in later chapters, as the necessary security architecture, risk, and value add for AI change drastically at each level.

Level	Name	Details
0	No AI	Manual workflows and human-only decisions.
1	Chatbot AI	AI is a tool for humans. Chatbots suggest responses and copilots generate content, but every decision still flows through a person. The business process is unchanged, and trust remains anchored on the human.
2	Embedded AI	Humans still define the business process, but AI replaces people in specific steps. The AI makes decisions inside a known framework. We trust the output because we trust the process we gave it, and we trust the AI to act only within those boundaries.
3	Agentic AI	The AI no longer follows a fixed script. Given a goal and a set of tools, it plans steps, calls APIs, and reorders workflows on the fly. Trust is centered largely on the AI's behavior within its tool scope—it decides not just what to do but how to do it.
4	Sustainable AI	All capabilities of Level 3, but with all areas of the business rethought end-to-end in light of AI and all implications: processes, roles, and software are wrapped with complex new guardrails, validation, and monitoring tasks to allow the organization to realize benefits without increasing risk.

As we will discuss in later chapters, each level dramatically bumps both AI capability and risk upwards together — until you achieve the Sustainable AI stage. Level 1 represents AI bounded by human review. Level 2 inherits fixed business logic and allows us to put gates at known points. Level 3 introduces autonomy—creative, sometimes unstable, and increasingly opaque. And at level 3, tools handed to each AI agent must be considered in terms of how they could intersect and multiply the risks from all other tools you've given that agent already - and all other agents it talks to.

Summary & Transition

We've covered a lot of ground in this chapter, but one thing is now clear: modern AI is not a monolith. It's a continuum of different concepts from deterministic rules to adaptive learning, from static models to generative engines, from single-shot prompts to agentic orchestration. Understanding where your system sits on that continuum shapes everything else—how you secure it, how you govern it, how you extract value from it, and how you communicate about it with your board.

In particular, we've highlighted the emerging importance of RAG, agentic systems, and AI-training. RAG allows models to tap into dynamic, proprietary knowledge without retraining. Agents allow models to do more than generate text; they can plan and act.

Reinforcement-learning frameworks are enabling agents to improve other agents. All three unlock enormous potential, but they also introduce new risk surfaces—prompt injections into retrieval pipelines, cross-plugin exploits in agent toolchains, and feedback loops that amplify biases or errors. We'll explore those vulnerabilities in Chapter 4's threat-modelling section and the implications for fairness and trust in Chapter 7.

In the next chapter, we move from concepts to process. Now that you understand the building blocks of modern AI, we'll explore ways these technologies actually get embedded into the enterprise. We'll map the lifecycle of an AI project—ideation, planning, development, deployment, and operations—and examine where things go wrong. Armed with the vocabulary from this chapter and the framework we've introduced to assess the progression of AI in your enterprise, you'll be better equipped to spot patterns and apply the right controls at each stage.