Alfonso Graziano - Tech Lead & Software Engineer

This article is the part 2 of a series of articles about AI. If you landed on this page, please read

On AI-assisted software engineering first.

In this article we will analyze how LLM context can be built starting from the key components.

What is this “context”?

Context for an LLM is just… numbers. Specifically, tokens. We pass text to an LLM (and images, if it’s multimodal) and this will be converted and processed as tokens. From a semantic point of view, we can divide the context into multiple components which we might add or not. The interesting thing about context is how we retrieve it!

One of the biggest challenges we do have at the moment while working with agents is how we can retrieve and pass to the LLM the right context, always being cautious about the limitation of the window size and the accuracy loss as soon as we bring more and more context.

The key thing to understand is that, while interacting with an LLM, the only mandatory thing is the user query. Everything else is optional and it’s main goal is to provide more information to the LLM to provide (hopefully) a better answer based on more information we provide.

System prompt

The system prompt is one of the most important parts of context engineering. It defines the identity, behavior, and boundaries of the LLM or agent. You can think of it as the foundation layer of the conversation: everything else (user input, memory, and tools) builds on top of it.

A good system prompt often includes several key sections:

Role definition – who the model is

Example: “You are a technical assistant specialized in software engineering.”

Goals – what the model should achieve

Example: “Your goal is to help users write clean, efficient TypeScript code.”

Tone and style – how the model should communicate

Example: “Use clear and simple English. Be concise and professional.”

Behavioral rules – what to do and what not to do

Example: “Always explain your reasoning briefly before giving the answer. Do not write unsafe code.”

This means the system prompt directly influences the model’s reasoning and style throughout the conversation. When we design context for an LLM, the system prompt is the first and most stable part. It helps with:

Consistency: all outputs follow the same logic, tone, and goals.

Safety: prevents the model from performing unwanted actions.

Efficiency: reduces the need to repeat instructions in every user prompt.

Alignment: keeps the model focused on the task or role we expect.

In short, a well-written system prompt reduces confusion, improves quality, and helps the model stay “in character.” Usually, the system prompt is static: is written in a config file and is loaded in the agent every time we start a conversation.

Available tools

Tools are the external capabilities that the model (or the agent wrapper around the model) can call upon. These tools expand what the model can do beyond just generating text.

A tool is a function or interface that:

has a clear name summarising its purpose

has a description that explains what it does

requires a set of parameters to work

produces a defined output

has a schema (often in JSON) that defines what a valid call looks like

For example, in one agent framework, a tool might be a “web search” API, or a “file system read” function. Using well-defined schemas ensures that the LLM can reliably call tools and interpret their outputs. Proper tooling helps keep the context size manageable: instead of stuffing everything into the prompt, we can rely on tools and retrieve information when needed.

Tools are both part of the context (as we need to inject the tool definition) and a way to retrieve more context dynamically directly from the agent!

User Input & User-provided context

User input is the immediate request or command from the user. It is the piece of context that triggers the agent’s action. It tells the agent what the user wants now.

User input can take many forms, such as:

A natural-language question (“Generate unit tests for this function.”)

A command (“Search the codebase for occurrences of TODO.”)

A specification (“Refactor the module auth.ts to follow the new architecture.”)

A parameterised request (“Use library X version 5.2 to implement feature Y.”)

The key point is: user input is the latest turn in the conversation or workflow, and it tells the agent what now needs to be done.

When we design the context for an agent, user input matters because:

It defines the task boundary: it tells what the agent should focus on.

It shapes the retrieval of relevant context: the agent must pick the right tools, memory, documents based on what the user asked.

It is a dynamic input: unlike static environment or user profile, this changes turn-by-turn and must be processed correctly to maintain coherence and relevance.

In some cases, you might end up working on the same task type. When this is the case, most probably the user prompt will be relatively similar, and just a few things might change (as parameter in a function).

Therefore, the tooling evolved to have prompt templates which are like helper functions: you recall the prompt template, it gets injected in the context, then you add your customizations.

Examples of this are Commands in Cursor or Prompts in MCP.

Also, apart from defining what we want to achieve, we can also pass some more context to explain how we want to achieve it. Rules are a nice example of this. While we write the user input, we can recall and add explicitly one or more rules to the context just by tagging them with @ruleName. A rule is usually just a Markdown file which contains style guides, restrictions etc.

Depending on the task you’re performing, injecting the right rules can do the difference! Also, in some cases it’s also possible to recall a rule directly from a prompt template.

There are also standards which are emerging such as AGENTS.md to do something similar.

Thanks to the user-provided context, the human interacting with the agent can provide manually more context to better guide the agent in the right direction.

After the user starts the interaction, the agent takes over and begins the process of reasoning, planning, and acting based on the given context.

At this point, the LLM has a full view of the context it can access, including the system prompt, environment, available tools, and user input, and it uses all this information to decide what to do next.

From request to action: how the flow works

Let’s break down the typical flow of information and execution inside an agent-based system:

User input arrives

The user issues a request, such as: “Refactor the authentication service and add logging for failed login attempts.” The agent receives this as text, which is part of the current context.

Context assembly

Before the model starts reasoning, the orchestration layer (the “agent runtime” such as Cursor, Claude Code, GH Copilot etc) assembles all relevant context:

The system prompt defines the agent’s identity and behavior.

The environment provides static and dynamic information about the system (repo, architecture, OS, date, etc).

Rules, skills, and commands are loaded from static files if relevant.

The available tools (declared in JSON schemas) are included so the model knows what actions it can perform.

The conversation history and memory (if any) are added for continuity.

This assembled context is then passed to the model as the “input window.”

The LLM reads all of it and reasons about how to satisfy the user’s request.

Planning phase (context discovery)

Once the model has all the context, it starts by creating an internal plan.

This plan might include:

Understanding what additional information it needs (for example, “What does the auth service currently look like?”).

Identifying which tools to use to retrieve that information.

Deciding the logical order of operations (e.g., inspect → edit → test → summarize).

This process is sometimes referred to as context discovery.

The model uses reasoning techniques (like Chain-of-Thought) to figure out what it needs to know, and how to gather it efficiently.

Tool execution and external calls

After building the plan, the model starts using the tools defined in its context. Each tool execution is mediated by a protocol or API layer.

One emerging standard for this interaction is the Model Context Protocol (MCP), which defines how LLMs and agents can discover, call, and exchange data with external tools or services in a structured and secure way.

Using MCP (or similar interfaces), an agent can, for example:

Call a file system tool to read code.

Query a database or internal API to fetch relevant data.

Run commands like grep, build, or test.

Query external services via HTTP or RPC.

Each tool call returns structured output, typically in JSON, which is then added back into the context for the next reasoning step.

This way, the model gradually collects more context until it has enough information to proceed with the user’s request.

Iterative reasoning loop

After every tool call, the agent evaluates the results:

Did the tool return what was expected?

Is more data needed?

Has the task been completed?

This forms an iterative loop of reasoning and action: Reason → Act → Observe → Adjust

This loop continues until the agent determines that the task is complete, or that no further progress can be made.

Some frameworks add a feedback mechanism (either from the user or automatically based on validation rules) to check if the output is correct before proceeding.

Producing the final answer

Once the agent has gathered all required information and executed all necessary actions, it produces a final output.

Depending on the design, the output might include:

The final artifact (for example, the refactored code or a generated file).

A summary of the steps executed (useful for audit or debugging).

Logs or reports about tool calls, test results, or actions performed.

Next-step suggestions or validation notes.

This final message is what the user sees as the result of the interaction.

Here’s a simplified example of what this flow might look like inside a coding agent:

User: “Add logging to failed login attempts in the auth service.”

Agent:

Loads the system prompt, environment via AGENTS.md (Node.js v18, Express, PostgreSQL), and tool definitions.

Analyzes user input and decides to read auth.ts.

Calls the read_file tool through MCP.

Parses the result and identifies where to insert logging.

Generates code for the new logging statement.

Writes changes using the write_file tool.

Runs tests with the run_tests tool.

Summarizes the result and returns it to the user.

Each step includes a tool call, a reasoning phase, and a feedback check.

Retrieving context dynamically at runtime through the usage of tools is therefore a big part of how modern agents operate.

Static context (like system prompts, rules, or environment configuration) gives the agent a foundation, but most real-world tasks require fresh, situational information, something the model can only get by interacting with its environment.

To achieve this, agents use tools and protocols that allow them to fetch, explore, and query data while they run.

Let’s look at the main sources an agent can leverage to gather more context dynamically:

Fetch (API requests)

One of the most common ways to retrieve data. Agents can use a fetch tool or an HTTP client to send requests to APIs, microservices, or backend endpoints.

For example, the agent might query an internal API to get a list of users, or an external service to fetch recent metrics. Responses are returned as structured JSON and become part of the agent’s runtime context.

Browser interaction

Through tools like a Playwright MCP server, the agent can interact with real web pages, clicking buttons, filling forms, or reading page content.

This is especially useful when APIs are not available and the only way to access information is through a web interface.

Browser-based interaction expands what the agent can “see” and helps it operate in more complex workflows.

Filesystem

The agent can inspect local or remote files to understand what exists in a project or repository.

For example, it can read configuration files, check code structure, or analyze logs.

This allows the model to retrieve domain-specific context directly from the source code or data files, improving accuracy and relevance.

Terminal

Agents can execute terminal commands in a controlled environment to gather information about the system state.

Examples include running ls to list files, git status to see repository changes, or npm test to verify code quality.

These commands provide live, actionable feedback that becomes part of the decision loop.

RAG (Retrieval-Augmented Generation)

RAG is used when the agent needs to retrieve information from large knowledge bases or document stores.

The system indexes documents into vector embeddings and retrieves the most relevant chunks based on a query. RAG can range from simple document lookup to complex multi-source retrieval pipelines.

Because of this complexity, we won’t analyze it in detail here, but it remains a key strategy for scaling an agent’s memory.

Web search

When the information is not available locally, agents can perform web searches to get public data. This is often done through specialized APIs or search tools (e.g. Tavily).

Web search gives the agent access to the latest, up-to-date information beyond its training data.

Code Sandbox

Sometimes the agent needs to write and execute a small script to compute intermediate results, transform data, or inspect artifacts that aren’t directly accessible through other tools.

This can be achieved by using a Code Sandbox, such as node-code-sandbox-mcp, which provides a safe, isolated runtime where the agent can run code snippets, test logic, or analyze outputs without affecting the main system.

It’s especially useful for quick calculations, parsing data, or verifying small code segments before applying larger changes.

Other local or networked resources

Finally, agents can access any other authorized data source available on the local system or through a network. This includes internal APIs, databases, or third-party services that require authentication.

Standards such as OAuth 2 are often used to handle secure access tokens, ensuring that agents can safely interact with restricted systems.

The Model Context Protocol (MCP) already supports authorization and secure resource access, making it easier to standardize how agents communicate with multiple systems.

How give Agents the right context?

Giving an agent the right context is a continuous process of curation, documentation, and optimization. An agent works best when it has access to accurate, up-to-date, and well-structured information.

You can think of it as an exceptionally capable coworker who, however, always starts each day as if it were their first. Without proper documentation, clear instructions, and accessible resources, even the best model will struggle to perform effectively. Therefore, maintaining consistent and comprehensive context files (rules, style guides, and documentation) is essential. Every time something changes in your environment or workflows, take the time to update these references so the agent can stay aligned with reality.

Tools are another critical piece of context. They define what the agent can do and how it can interact with the environment. However, more is not always better. When too many tools are available, the model may struggle to choose the right one, leading to inefficiency or confusion. Providing only the necessary tools, accompanied by clear descriptions and examples, helps guide the agent’s reasoning. In some cases, you can even mention explicitly in your user input which tools should be used for a specific task, reducing ambiguity and improving execution speed.

Context cleanliness also plays a major role in ensuring high-quality results. As conversations grow longer, the LLM’s context window fills up, and the quality of reasoning can degrade. This happens because older messages are either truncated or summarized automatically by the agent runtime (depending on the agent capabilities). Since these summaries are typically generated by the model itself, there is no guarantee that the most important details will be preserved.

For this reason, it’s often better to start a new chat once a task is completed, or even mid-way through a complex task if responses begin to lose precision or relevance. A fresh session ensures that the model starts reasoning with a clear and focused context, free from accumulated noise.

When sharing large amounts of structured information, such as JSON data, logs, or configuration files, it’s wise to use compact and machine-friendly formats to optimize token usage. Compression formats such as Toon or custom JSON minifiers can help you include more context without exceeding the model’s token limits. This is particularly useful in agent workflows where large payloads (for example, code metadata or retrieved embeddings) need to be passed efficiently between reasoning steps.

Finally, providing the right context is not only about quantity but also about intentionality. It means being deliberate in what you include and what you leave out. Too little context makes the agent blind; too much makes it distracted. The goal is to give just enough information for the model to reason effectively while staying within the context window. This balance, between precision, relevance, and clarity, is what ultimately determines how well an agent can understand and execute a user’s intent.

Context is all you need

What is this “context”?

System prompt

Available tools

User Input & User-provided context

From request to action: how the flow works

How give Agents the right context?