Artificial intelligence is usually discussed in terms of what it can do. How accurate is it? How fast? How much work can it take off our plates?
But performance is only part of the story. When AI is involved in decisions that affect people’s lives, someone has to be responsible. The question is who.
Across very different fields, from engineering labs to medical education, the same idea keeps surfacing. Keeping humans “in the loop” is not a temporary safeguard or a design choice to be optimized. It is a moral decision. It is a way of saying that efficiency does not replace accountability, and that judgment cannot be fully automated.
This tension came up repeatedly in my conversations with two researchers working at very different points along the AI pipeline.
At the technical end of this pipeline, “human-in-the-loop” is often discussed as a procedural detail. It is treated as a feature to be added early and removed later, once systems become more capable and trustworthy. But this framing misses what is actually at stake. When AI systems are designed to recognize people, interpret behavior, infer emotional or cognitive states, and personalize responses in real time, oversight is no longer just a design choice.
Dr. Ali Etemad, an Associate Professor in the Department of Electrical and Computer Engineering at Queen’s University, works on human-centred AI that draws on many types of data, including text, images, audio, and biological signals, to understand who people are, what they are doing, and how they are feeling.

In this context, his concern is not simply that AI systems make mistakes, but that their errors can carry an unwarranted sense of authority.
“Hallucinations are a big problem in language models where the model generates output that sounds real but is not real. This has happened to me many, many times where I’ve asked for a reference about a particular thing and it makes up a legitimate sounding paper that doesn’t actually exist.”
The risk here is not simply incorrect information. It is the confidence with which that information is delivered. Systems that speak fluently and persuasively can make errors feel settled and final, even when they are not. When human oversight becomes symbolic rather than active, people stop questioning outputs that sound convincing. Keeping a human in the loop is not about checking grammar or fixing small errors. It is about maintaining responsibility for accuracy. Someone must still decide whether an answer is reasonable, appropriate, and safe.
The same ethical tension shows up downstream, when AI systems move into real decision-making environments. In postgraduate education, for example, large language models (LLMs) are increasingly used to summarize resident evaluations, analyze feedback, and help promotion committees manage large volumes of data about individual medical trainees. These tools promise efficiency, and they often deliver it. But they also quietly reshape how decisions are made.
Dr. Benjamin Kwan, an Assistant Professor and Neuroradiologist at Queen’s University, Educational Innovations Lead at PGME and Faculty Research Director for Diagnostic Radiology, works directly at this intersection, researching how large language models are used to support assessment and decision-making in postgraduate medical education. For him, the limits are clear.

“I don’t think we should ever replace the human in these decisions. We should always have what they call the ‘professor-in-the-loop’ or the ‘teacher-in-the-loop’ to make sure everything is appropriate.”
This perspective does not imply resistance to technology. It is an acknowledgment of responsibility. When evaluative authority shifts, even subtly, from people to systems, accountability becomes harder to pin down. If an AI-assisted recommendation disadvantages a trainee, who answers for that outcome? The algorithm? The institution? Or the human who deferred too easily to an LLM?
When decisions are delegated, responsibility does not disappear. It just becomes harder to trace.
Machines Don’t Wash Bias Clean
Automation is often framed as a solution to human bias. The logic is appealing: remove people from the process and decisions become more objective. But this logic collapses under closer scrutiny. From a technical perspective, Dr. Etemad makes clear that bias is not just a problem of flawed or unbalanced data. It can be introduced much earlier, through choices that are rarely visible to end users and are often treated as purely technical.
“We proved that certain training algorithms with the same data and the same model could make the model more biased or less biased. Using certain algorithms to train neural networks could have a huge impact on how biased these models will become when they are released.”
This matters because it undermines a widely held assumption—that holding data constant guarantees consistent outcomes. As Dr. Etemad’s work shows, that assumption fails. Models trained on the same data diverge significantly depending on design and training choices. Bias, in other words, is not just inherited. It can be built in through decisions embedded deep within the technical pipeline.
Dr. Etemad also points out that addressing bias is rarely straightforward because improving fairness often involves trade-offs that cannot be resolved by optimization alone.
“There is a fairness–performance trade-off. One way to increase fairness is to penalize the better-performing group so that everyone performs equally poorly, but that’s not really what we want.”
Deciding what counts as “fair enough” is not a technical judgment. It requires values. That same tension shows up clearly in educational settings. Dr. Kwan is cautious about using AI systems to make or strongly influence decisions about progress, not only because they are informed by past evaluations (which reflect their own judgments and constraints), but because the systems themselves encode assumptions about what should count as performance, risk, or success.
“You probably wouldn’t want to use a tool that will determine if somebody passes or fails solely because of all these potential bias problems.”
The concern is not abstract. In practice, automated systems tend to present their outputs as neutral summaries or recommendations, such as a score, a ranking, or a flagged concern. Once framed this way, decisions can feel less open to discussion. When a person makes a biased decision, it can be questioned. They can be asked to explain their reasoning. But when a system produces the same outcome, the decision often feels procedural, as though it simply followed the rules.
This shift matters. Bias does not need to be extreme to be harmful. It only needs to become harder to challenge.
Human-in-the-Loop Is About Responsibility
These perspectives are connected by a shared understanding. Insisting on human oversight is not distrusting AI. It is about refusing to abandon as our responsibility as those who design and deploy it. For Dr. Etemad, this responsibility is most visible in the problem of alignment. In systems that interact closely with people, the central question is not simply whether an AI system performs well, but whether its outputs reflect the priorities and values it is meant to serve.
“One of the important sub-areas in this context of AI that interacts with humans is responsible AI. One of the ways to address that is alignment. Can we develop models that are aligned with a set of values that we’re interested in?”
Alignment, in this sense, cannot be achieved through optimization alone. It requires human judgement about which values matter, how they should be balanced, and when a system’s output must be constrained or overridden. From this viewpoint, human-in-the-loop systems are not a temporary safeguard. They are one of the few ways to ensure that responsibility is upheld. Someone still has to stand behind the decisions an AI makes.
In professions like medicine and education, this issue is critical. These fields depend on trust and their decisions shape real lives. As Dr. Benjamin Kwan put it, AI may become an increasingly powerful assistant, but it should never be the final authority.
“AI will be a good helper… another voice at the table.”
A voice, not a verdict.
The Question We Should Be Asking
We spend a lot of time debating what AI will eventually be capable of doing. The more important question is what we should ask it to do. Human-in-the-loop systems are slower. They require explanation. They force people to stay engaged. That is exactly why they matter.
Because automated systems don’t take responsibility. People do.