Alfonso Graziano - Tech Lead & Software Engineer

Have you ever been in this situation? Your manager rushes to your desk with a vague request like: “We need a customer presentation… something about the quarterly results of our platform, value, impact… I’m super busy, so just make it good.”

Then he disappears into another meeting.

You try your best. You spend five days polishing slides, adding diagrams, rewriting the story. You even add a few animations for extra flavur. But when your manager finally reviews it, his reaction is… painful.

“Hmm… this is not really what I had in mind.”

The issue wasn’t your skills. The issue was the missing feedback. You had some context, but it was incomplete. The project moved forward without alignment, and the final result became something completely different from what your manager imagined.

Now imagine a second version of the story.

This time, your manager still gives only a short request, but instead of working for five days in silence, you take a different approach. You spend 20 minutes preparing a simple list of bullet points: the agenda, the key messages and the story flow. You send it to your manager. He replies quickly:

“Good start. Remove part 3, focus more on value, and include a customer case study.”

You adjust, send it again, get another round of feedback and keep iterating.

After just one day, the full presentation is done and it is exactly what he needs for the customer.

This story is the perfect metaphor for how feedback transforms AI performance!

Without feedback, an AI system behaves like the first version of you: it tries its best, but if the initial instructions are vague or incomplete, it may end up very far from your expectations. The model cannot read your mind; it only sees what you type.

With feedback, AI becomes more like the second version: fast, aligned and efficient. Each correction enriches its context. Instead of one long, risky attempt, you get many tight, controlled iterations that guide the system toward your real goal.

And this is the key idea: feedback is not just a patch, it is the steering wheel.

Without it, even a powerful AI goes off track. With it, the system becomes a collaborative partner that can deliver high-quality results in a fraction of the time.

While giving feedback early and often feels intuitive to us as humans, the first wave of AI adoption advertised something different. We tried to automate complex tasks with a single prompt: “Do everything end to end.” It looked magical in demos, but in real workflows it quickly showed its limits, especially on non–state of the art models or on SLMs (Small Language Models). The output was inconsistent, missing details or simply wrong.

That’s why the industry shifted from pure automation to collaboration. Instead of expecting the model to succeed alone, we place humans inside the loop: guiding, correcting and steering the AI step by step.

Human-in-the-loop is not simply “tell the AI it made a mistake.”: it is a structured way to enrich the AI’s context in real time. LLMs behave differently depending on the context they receive as you might know from the Context Engineering pillar.

If the context is incomplete, the model fills the gaps with assumptions. If the context is rich and continuously refined, the model becomes far more precise.

This is where HITL shines. Your feedback becomes new context. Your clarifications become constraints. Your corrections become rules the system adapts to.

Step by step, you build a dynamic knowledge layer around the model that nudges it toward your goal.

In other words, HITL turns every interaction into an opportunity to inject more meaning into the system. You are not just fixing mistakes: you are expanding the model’s understanding of your world.

This enriched context becomes the fuel that drives better reasoning, fewer hallucinations and more stable results. The magic of HITL is simple: the AI becomes smarter not because the model changes, but because the context becomes richer.

HITL in Agentic Systems

Agentic systems are not just chatbots that answer questions. They are AI systems that can act: run tools, read files, write code, modify documents, search the web and more. This makes them incredibly powerful, but also introduces new challenges. Actions create consequences. Consequences need supervision. And this is exactly where HITL becomes essential.

At the core of every agent lies a simple loop:

Perceive the current state

Decide what to do next

Act by using tools or generating output

Evaluate the results

Repeat

If this reminds you of how a junior engineer works, that’s not a coincidence. The agent tries something, observes what happened and then decides the next move. But without a human reviewing these steps, the agent might take a path that is technically correct but completely misaligned with your real goal.

HITL inserts you directly into this cycle, giving you the ability to guide, interrupt, refine or redirect the agent before small mistakes become big ones.

Every time an agent completes one iteration of the loop, it produces artifacts. These artifacts are the real “footprints” of AI actions. They can be:

Text: explanations, plans, notes, decisions

Files: markdown specs, configs, documentation

Code: new features, refactors, entire backend endpoints

Edits: changes to existing files in your repo

Logs: results from tools, API calls or tests

These artifacts are gold. They tell you what the agent understood, what it built and how it interpreted your request. An example of process that generates multiple artifacts is spec-driven development where the system produces specs and other text artifacts before implementing real code .

But artifacts also tell you something more important: where to intervene.

If a plan is wrong, no code should be written.

If the code is right but tests fail, the issue is likely in the behavior.

Artifacts show you where feedback is needed and how to steer the next loop.

Once the agent generates artifacts, it’s your turn. This review phase is where HITL proves its value. You check the output and ask questions like:

Does this match my intent?

Are any steps missing?

Did the agent misunderstand a constraint?

Is the code correct? Efficient? Secure?

Are the tests meaningful and complete?

Think of it like reviewing a pull request from a high-speed junior developer who never sleeps. You are not fixing everything yourself; you are deciding whether the next loop continues or adjusts course.

This human review prevents the agent from drifting and keeps the quality consistently high.

The real power of HITL in agentic workflows comes from iteration.

With each loop the agent refines its understanding, the context becomes richer, the artifacts become more accurate and he distance between “current state” and “desired state” shrinks

And because each round includes your feedback, the agent moves in the right direction faster and with fewer mistakes.

Instead of a single high-risk attempt, you get multiple low-risk iterations.

This iterative alignment is what makes complex AI-driven development possible. You and the agent move forward together, step by step, until the final result is not just “acceptable,” but exactly what you envisioned.

That is the promise of HITL inside agentic systems: precision, reliability and collaboration at scale.

Advanced HITL Techniques

As AI systems become more capable, we are not just improving the quality of feedback: we are transforming how feedback flows. Traditional HITL meant a single user correcting a single model. Today, new patterns are emerging where AI agents can collaborate with entire teams, route questions to the right humans and even pause their execution until a human responds.

In a way, AI is starting to behave like a real software engineer: asking for clarification, requesting reviews and escalating when it gets stuck.

In classical setups, humans decide when to review the agent. In modern agentic workflows, the agent decides when it needs you.

For example, an AI system implementing an API endpoint might pause because:

it is unsure about a business rule

it needs approval before modifying production data

it has two possible interpretations of a requirement

it detects missing inputs only a human can provide

This “human-on-demand” pattern makes HITL smarter and more efficient.

Instead of humans monitoring everything, the agent invites humans at the exact moment their expertise is required.

The A2HA approach: Agent-to-Human-Agent

One project experimenting in the space is A2HA: Agent-to-Human-Agent. This approach allows an AI agent to autonomously reach out to humans in your organization, ask for help and resume its workflow once a human replies. A full working example and implementation can be found here:

👉 https://github.com/alfonsograziano/a2ha

In an A2HA workflow:

The agent realizes it needs human support

It triggers a request through a proxy system

The message appears in a human-facing tool (like Slack or Email)

The human responds

The response flows back into the agent asynchronously

The agent continues working with the new information

It is the closest thing to having an AI coworker who taps you on the shoulder when needed.

A2HA opens the door to something powerful: AI agents that can route their questions to the right human, not just any human. Just like a software engineer knows who to ask for a security review, who to involve for architectural decisions and who owns a specific part of the product, an AI agent using A2HA can leverage metadata like skills, ownership, responsibility, availability and other factors to decide which human should receive its request.

This prevents irrelevant pings to the wrong people and ensures that key decisions are reviewed by the appropriate experts. In practice, A2HA enables a form of intelligent “review routing” that boosts both team efficiency and trust: the agent becomes a respectful collaborator who asks the right person at the right time.

In more advanced setups, agents don’t just receive feedback from one human: they gather feedback from multiple humans and aggregate it.

Useful in scenarios like:

design reviews

code audits

risk analysis

product requirement refinement

The agent can combine overlapping answers, detect contradictions and even ask follow-up questions to resolve disagreements.

This mirrors real-world team decision making where insights come from different roles.

Over time, this multi-human feedback becomes a powerful form of contextual enrichment, giving the agent a more complete view of the task and reducing the risk of errors caused by ambiguity.

Evaluating HITL Tradeoffs

Human-in-the-loop brings massive benefits, but it also comes with tradeoffs. It’s a balancing act: more feedback improves quality, but it also adds latency, cost and complexity. Understanding these tradeoffs helps you design AI workflows that are reliable without becoming slow or expensive.

Speed vs Accuracy

Adding humans to the loop naturally slows things down.

Full automation is fast, but it also comes with a higher chance of mistakes especially in complex or ambiguous tasks.

Think of it like a code review:

No review: lightning fast, but risky

Too much review: extremely safe, but painfully slow

Balanced review: fast enough, accurate enough

HITL lets you decide where along this spectrum your task should sit. Mission-critical tasks lean toward accuracy. Low-risk tasks lean toward speed.

Autonomy vs Control

The more autonomy you give an AI agent, the more it can accomplish without blocking.But autonomy always reduces control.

High autonomy works well for:

drafting documents

generating early prototypes

brainstorming

transforming files or content

Low autonomy (meaning more human involvement) is better for:

financial decisions

security-sensitive workflows

modifying production code

anything where mistakes have real-world impact

HITL helps you dial autonomy up or down depending on the risk level.

Cost vs Quality

Human feedback costs time and money. But skipping feedback often costs even more: in rework, debugging and failed outputs.

A simple formula helps illustrate it:

Less HITL: cheaper now, more expensive later

More HITL: more expensive now, cheaper and safer later

The key is proportional investment: don’t spend ten hours reviewing a three-minute task, and don’t automate a mission-critical workflow without supervision.

When to Use HITL and When to Automate

You don’t need HITL everywhere. In fact, overusing it can slow teams down.

Use automation only when the task is simple, the cost of failure is low, outputs are easy to verify automatically or you want pure speed.

Use HITL when the task has unclear requirements, the model must follow strict constraints, errors are costly or dangerous or quality matters more than speed.

Also, HITL isn't all-or-nothing. You can apply it selectively: early for alignment, lightly during execution or heavily at final review.

Too little HITL risks bad output. Too much HITL slows everything down. The real magic happens in the middle, where humans guide the AI just enough to keep it on track while still enjoying all the speed and power automation brings.

Human In The Loop (HITL)