---
title: "Q&A — Full Written Answers"
event: Scite webinar · *Academic AI: From Policy to Practice*
date: 2026-05-14
author: Magnús Smári Smárason
status: draft for publication on smarason.is
source: post/Zoom/qa_89669342296_2026_05_14.csv (17 questions) + cleaned Q&A transcript
---

# Q&A — Full Written Answers

The webinar's live Q&A ran out of time. Seventeen questions came in; we answered seven on air. Here are written answers to all seventeen — the live-answered ones tightened from the transcript, the unanswered ones written out for the first time. Same voice. Same standard.

Where I cite a paper, the DOI is real and resolves. Where the evidence is thin, I say so.

---

## 1. "Could we have access to BORG?" — Mo

Short answer: not as a product. BORG is not a SaaS we license out. It is the **University of Akureyri's** infrastructure, running on our own servers, hosting our own data, governed by our own policy, and tuned for our own staff and students. Handing you a copy would hand you our governance surface area, our user model, and our liability — none of which transfer.

What I can give you is the methodology. The repository pattern, the eval discipline, the knowledge-graph-first design, the policy that anchors it ("the ultimate responsibility for the work remains with the human"). I publish that work in public on smarason.is. If a university wants to build its own equivalent, the recipe is open; the kitchen is yours to staff.

The deeper answer is in Question 5: _do not buy this; build your own._ An institution that consumes BORG instead of building BORG becomes a tenant in someone else's governance. That defeats the point.

---

## 2. "How was the bad DOI corrected by the Oracles?" — Mo

The Honest Oracle is an adversarial agent role inside the research swarm. After the specialist agents produce their findings, the Oracle attacks the draft for: unsupported claims, citation misrepresentation (where the source does not actually say what is attributed to it), circular citation chains, convergence bias, and absence-blindness.

The mechanism for catching a "bad DOI" has three layers.

**Layer 1 — existence.** Every DOI in the synthesis is resolved through the Crossref API. If it does not resolve, it does not stay in the document.

**Layer 2 — citation context.** This is where Scite earns its place. Smart Citations show whether a paper is being cited in a _supporting_, _contrasting_, or merely _mentioning_ context. A DOI that supports our claim should be cited supportively by the literature around it. If it is overwhelmingly cited as contradicted or mentioned-in-passing, that is a signal to look harder.

**Layer 3 — the retraction filter.** Scite's `has_retraction: true` filter found 15 retracted papers in our GenAI-in-education corpus. The most consequential — Yu (2024), _Heliyon_ — has accumulated **50 Smart Citations across 184 citing publications _after_ retraction**. That is the citing-layer failure mode the Smart Citation infrastructure was built to address.

The concrete correction in our own sprint: an early draft cited Bastani, Bastani & Sungu (2025), _PNAS_, on the post-removal deskilling effect, without the August 2025 PNAS correction. The Oracle flagged it. The corrected version is now spoken on stage. Reference: https://doi.org/10.1073/pnas.2422633122, correction: https://doi.org/10.1073/pnas.2518204122.

The honest version of the answer: no software catches everything. The Oracle catches what a single careful reader would catch on a second pass. That is the standard — not perfection.

---

## 3. "Are agents permitted on BORG, or is it just a chatbot interface?" — Iain MacLaren _(live answered)_

Today, BORG runs **specialised chatbots** over a **knowledge graph**. The chatbots are thin. The knowledge graph is the brain. Each chatbot has access to a curated database through retrieval-augmented generation, and each one is wrapped in an evaluation suite that we use to deliberately torture it before any user touches it.

We do not run agentic systems in production yet, for two reasons. **Manpower:** building reliable agentic systems in an enterprise environment is genuinely hard with a six-person team. **Demand:** our users do not currently need agents to do their work. They need fast, correct, source-grounded answers to specific questions.

On throttling and bottlenecks (your second point): the economic model is part of the answer. We pay only for the API calls we make. Costs are linear with use. We can swap providers without changing the user interface, because the interface sits over our own knowledge graph and not over a vendor's product. During the summer, when the campus is quiet, the bill collapses. There is no per-seat licence to absorb in the quiet months.

When agents become genuinely useful for _our_ users — not because the industry is excited about them — we will add them, evaluate them like the chatbots, and ship them with the same discipline. The order matters: utility, then capability, then evaluation, then production. Not the reverse.

---

## 4. "What harness can you share with us to replicate? A well-tested one, please." — Mo _(live answered)_

I borrowed a metaphor from role-playing video games on air, and I will keep it here. You do not start the game at level 60. You start with a simple knife, and the game teaches you what to add as you go.

The harness our **students** have access to is two licensed tools: **Microsoft Copilot 365** and **Scite**. That is it. With those two, a serious operator can do work that, ten years ago, would have made them one of the most capable researchers alive.

The technique that turns those two tools into a harness is **context management**. A few rules:

1. **Use one tool to write prompts for the other.** Copilot is excellent at structuring a brief. Scite is excellent at finding and verifying the literature. Use Copilot to write the search; use Scite to do the search.
2. **Treat context like a recipe.** A fresh chat with a well-composed prompt outperforms a long thread that has been reprocessing the same context for an hour. Reset early. Reset often.
3. **Keep a prompt library.** I publish mine at smarason.is. The one most people start with is the "performance prompt generator" — you paste your messy thoughts above it, and it returns a structured prompt you can take into a clean chat.

The harness is something you grow into. When you have outgrown Copilot + Scite, you will know — because the work will start demanding things they cannot do. That is the moment to add the next tool, not before.

For builders rather than end users: my own working harness is **Claude Code with a personal-AI-infrastructure layer** (PAI / LEON), a knowledge graph, custom skills, and a dozen specialised agents. I am not recommending you start there. I am telling you what level-60 looks like, so you know the direction.

---

## 5. "What is your advice for a small higher-education institution making moves into AI? What size of team, what skills?" — Thomas Blennerhassett _(live answered)_

If I were starting again today, here is what I would do in order.

**First — make the institution talk to itself.** Get faculty, administrators, students, and IT in a room. Record the conversation. Speech-to-text. Combine the transcripts. The point is not to extract a strategy from these sessions; the point is to make the institution's own people the primary source for what the institution actually needs. That conversation will surface ten times the insight a vendor demo will.

**Second — find one operator.** Not a consultant. Not someone who has read about AI. Someone who has **built** with the tools, broken things with them, shipped something, watched users use it badly, and fixed it. One operator is enough to seed a small institution. Two is a luxury. Six is BORG.

**Third — empower that operator to drive both the training and the integration.** Training and integration cannot be separated. The same person who knows what the institution can run is the person who can teach colleagues what is realistic to ask for.

**The failure mode you are protecting against.** If an institution does not build its own capability, it becomes a **consumer of ready-made products**. It loses oversight of its own judgment — because the product encodes a vendor's judgment. And the culture of the institution flattens out, because everyone ends up using the same tool in the same way, shaped by the same external interest. That flattening is the quiet cost nobody puts on the invoice.

**Skills, in order of priority.** Curiosity over credentials. Systems thinking over feature knowledge. The ability to read a research paper. Comfort in a terminal. Willingness to be wrong in public. The technical skills (Python, APIs, vector databases, prompt design, evaluation harnesses) can be learned in months by the right person. The disposition cannot.

---

## 6. "I have low-to-moderate-technology-literacy students using AI. A voluntary virtual workshop launches Tuesday — small group expected. Will that be enough?" — Elisse Amstutz

Honest answer: a small workshop is the right _shape_. It will not be enough by itself, and it does not have to be.

What the small group makes possible that a lecture cannot: every student gets to put their hands on the tool and fail in front of you. That is the lesson. Reading slides about prompt design does not teach prompt design. Watching a student type a vague question into a chatbot, see the disappointing answer, and then watch _you_ reformulate the prompt in front of them — that teaches it in a minute.

Two suggestions to make a small workshop punch above its weight:

1. **Bring real assignments.** Ask students to bring something they are actually working on for another course. The exercise is to use the AI to make progress on real work, not on a contrived demo. Engagement triples when the stakes are real.
2. **Make the failure modes the lesson.** Hallucinated citations. Confident-sounding wrong answers. The model agreeing with whatever you say. Run a deliberate "trick the AI" exercise. Students with low technology literacy often _over-trust_ the output because the prose is fluent. Teaching them to distrust fluent prose is the most valuable hour you can give them.

What a small workshop cannot do on its own is build a culture. For that you need recurrence. Plan the next one before this one ends. The students who attended will recruit the ones who did not.

The evidence base behind this advice: Bastani et al. (2025, _PNAS_) showed that **unscaffolded GPT-4 access produced +48% in-session performance but −17% performance once the tool was withdrawn** — the dependency effect. Their scaffolded "GPT Tutor" prompt largely mitigated it. The lesson for your workshop: design the scaffolding into the exercise, not into the AI.

---

## 7. "Are there models where IT folks at a college/university have actually drawn on the expertise of faculty and departments?" — Joe Montibello

Yes — and the institutions where this happens visibly are the ones to watch.

The pattern that works, in my experience: **faculty domain expertise + IT operational discipline + a small joint team with budget authority.** Each of these alone produces predictable failures.

- **IT alone** produces secure, well-managed, vendor-driven tools that faculty refuse to adopt because they do not solve any real teaching or research problem.
- **Faculty alone** produces brilliant prototypes that never reach production because no one is responsible for backups, identity, monitoring, or the day a key person leaves.
- **A consultant or a vendor** produces a polished system that the institution does not understand and cannot maintain.

The joint model — and BORG is one example, though far from the only one — works because faculty bring the _what_ (what question is worth answering, what data is trustworthy, what counts as a good answer), and IT brings the _how_ (what can be operated, what is sustainable, what fails safely).

Public examples worth studying: the **Helsinki University collaboration** patterns around digital humanities; **Princeton's POPLI** approach to AI in teaching; the **University of Michigan's** AI strategic plan and its joint governance; smaller institutions are documenting work on **EDUCAUSE** working groups. The literature is uneven. The best material is in conference proceedings and institutional blog posts, not journals.

The bigger answer is structural: institutions that treat AI as **a teaching-and-research problem with technical components** outperform institutions that treat it as **a technical problem with teaching-and-research components**. The framing decides who is in the room. Whoever is in the room decides what gets built.

---

## 8. "It seems that those who are on the front line using the technology are the ones who are always missing from the conversation and the decision-making process." — Jonathan Underwood

You are right. This is not a question, but it deserves a direct answer.

I came into this work from the back of an ambulance. Sixteen years of front-line emergency response. The pattern you are describing is the same pattern that runs through every operational field I have worked in. The people closest to the work are the last to be consulted, and the first to be blamed when the policy designed without them fails.

There are three reasons it keeps happening, and one thing that helps.

**Why it happens.** First, the people on the front line are _busy_. They cannot leave the work to attend the meetings about the work. Second, the people writing the policy often cannot tell the difference between someone who _operates_ the technology and someone who _talks about_ operating the technology, so they recruit the latter because they are easier to find. Third, the institutional incentive is risk-management, not operational excellence — and risk-managers find front-line voices uncomfortable because front-line voices say things in plain language that committees cannot easily absorb.

**What helps.** Make front-line input _cheap to provide_. A short voice memo, a five-minute Loom recording, a Slack channel where staff can post a frustration without filing a ticket. Then have someone — usually the operator from Question 5 — synthesise the input into a form the committee can read. Most front-line people are happy to contribute; the failure is in the channel, not the willingness.

The deeper version: at UNAK we built the policy _first_ (because policy without infrastructure is theatre), but the next iteration of the policy is being shaped by what our front-line users — administrators, professors, students — actually do with BORG. The policy was the floor. The practice is what tells us where the next version needs to be.

---

## 9. "Is there an AI product you would recommend for its ethics?" — Alia _(live answered)_

I did my BA in law, with a thesis on the data-retention practices of telecommunications companies. After that, I have worn what I called on air a "tinfoil hat" about recommending any specific big-tech product. That is half the answer — and half a deflection.

Here is the rest.

There is no AI product that is unambiguously ethical, because _ethics_ in this context is a stack of partially conflicting concerns: data sovereignty, environmental impact, labour practices in training-data annotation, model-safety culture, transparency of training data, governance of the deploying organisation, and the politics of the jurisdiction the company is incorporated in. A product that scores well on one of these often scores badly on another.

What I recommend instead of a brand is a **literacy exercise**. Take the question seriously enough to answer it for yourself. Ask the AI you are already using to compile a comparison of the major providers across the dimensions _you_ care about. Cross-check what it produces against the providers' own published reports. Sean named **Anthropic** on air as having a reputation for safety-first culture; I think that is fair, and I use Claude as my primary model, so my disclosure is on the record. But "Magnús uses Claude" is not the same as "Claude is ethical" — it is the same as "for the work I do, this is the trade-off I have chosen."

The exercise of choosing for yourself is the point. The day a colleague hands you a one-line answer to this question — _"X is the ethical AI"_ — is the day to be suspicious of the colleague.

---

## 10. "I try to use [Khoj AI / Ecosia AI] to cut back on environmental ethical concerns. How else can we address those concerns?" — Elisse Amstutz _(live answered)_

The most important thing I have to say about AI and the environment is unflashy. **If you train yourself to use the tools well, you spend fewer tokens — and tokens are energy.**

Effectiveness is downstream of prompts. If you are reinventing how you work with AI every single day, always in a long back-and-forth, always reprocessing the same context, you are destroying the environment. If you structure your work, you are not. Sean called this a really great point on air, and I think the reason it lands is that it refuses the comforting answer. The environmental cost of AI is real, and it is not separable from operator competence. The careless operator burns more of everything.

Three more concrete moves:

1. **Match the model to the task.** A frontier model burns far more energy per query than a small local model. Ask whether your task actually needs the frontier. Most "write a short email" tasks do not.
2. **Cache and reuse.** Anthropic and other providers publish prompt-caching features that meaningfully reduce per-call token consumption. Use them. Build your tools to use them. A workflow that re-sends the same 4,000-token context twenty times a day costs vastly more energy than one that caches.
3. **Local where local works.** A local model on your own machine has the energy footprint of running your laptop. For low-stakes, high-volume tasks, this is the cleanest option available.

What I deliberately do _not_ recommend is choosing tools based purely on their marketing claims about sustainability. Independent audits of AI carbon footprints are still rare and methodologically contested. The honest disclosure is that the data we need to make confident comparisons largely does not exist yet.

---

## 11. "Professor Rife, how do you handle AI materials in class — citation of sources?" — Dr. Esther Burgess _(live answered, by Sean)_

This question went to Sean. His answer on air, paraphrased: get out ahead of it. In any writing-intensive or research-methods class, the first conversation is _how_ to use the technology as a thinking partner without outsourcing your critical thinking. Model the appropriate behaviour. Make the citation conventions explicit.

I will add the operator's footnote. **Cite the AI use, not the AI as a source.** A student writing "ChatGPT (2026)" in their references list is treating the model as an author. It is not. It is an instrument. The honest entry is in the methodology section: _"I used [model name, version, date] to brainstorm structure / draft an outline / check grammar / find candidate references. All cited literature was verified against primary sources."_

Then the verification standard kicks in. **The student is responsible for every citation in the bibliography, regardless of how it was found.** A hallucinated DOI is a hallucinated DOI whether the student typed it or a model suggested it. The empirical floor here is hard: Gravel et al. (2023, _Mayo Clinic Proceedings: Digital Health_) found **69% of ChatGPT-supplied medical references were fabricated** with real author names attached (https://doi.org/10.1016/j.mcpdig.2023.05.004). Magesh et al. (2025, _Journal of Empirical Legal Studies_) found 17–33% hallucination rates in proprietary legal-AI tools that explicitly use retrieval-augmented generation (https://doi.org/10.1111/jels.12413). The enterprise wrapper does not eliminate the problem.

That is the rule I would put on the first slide of any AI-in-the-classroom briefing: **the citation is yours; verify every one.**

---

## 12. "Academic integrity — plagiarism detection, etc." — Dr. Esther Burgess

This is the one I have the strongest evidence-based position on, and the position is uncomfortable.

**AI-text detectors do not work.** Weber-Wulff et al. (2023, _International Journal for Educational Integrity_) tested 14 detectors including Turnitin and found an aggregate accuracy of around 28% on paraphrased text — "**no better than random classifiers**" is the paper's phrasing (https://doi.org/10.1007/s40979-023-00146-z). Worse, Liang et al. (2023, _Patterns_) found that GPT detectors **misclassify over 50% of TOEFL essays by non-native English writers as AI-generated**, with near-zero false positives for US-born writers (https://doi.org/10.1016/j.patter.2023.100779). A university deploying detector-based enforcement is deploying a **biased instrument that does not work**.

If detectors are out, what is in?

**Assessment redesign.** Move the cognitive load to a moment AI cannot help with. Oral examinations. In-class writing. Process portfolios that show drafts and revisions over time. Tasks where students must defend, critique, or extend their own work in real-time conversation. The literature on this — see e.g. Liu, Z. et al. (2025, _Journal of Computer-Assisted Learning_) — is converging on **task design, not detection**, as the right governance response.

**Process evidence.** Require version history. Most major writing tools support it. A document with no revision history at all, for a complex 2,000-word essay, is a signal worth a conversation. Not a conviction. A conversation.

**Honest disclosure.** Students who declare their AI use openly are not the problem. Students who hide it are. A culture where disclosure is normal and expected, and the academic question is _how well did you supervise the tool_, is the culture where integrity is recoverable.

The line I keep coming back to: **citation integrity is a governance problem, not a software problem.** Detection software is a category error. The governance response is human, slow, and built into how we ask people to demonstrate what they know.

---

## 13. "Policy is important, but we're in the early experimental stage of AI — new models and features daily. Can policy keep up? Documentation cannot keep up — what do you think?" — Jungmin Byun

You have named the central design constraint of any modern AI policy.

The UNAK policy I wrote is **explicitly written for a technology that moves faster than the policy cycle**. The trick is to anchor on the _invariants_, not on the _features_.

The features change weekly. Whether the model has vision. Whether it has tools. Whether it has agentic capabilities. Whether the context window is 32k or 1M tokens. Whether the price dropped this week. A policy that names the features is obsolete on arrival.

The invariants change rarely. The student's responsibility for their own work. The faculty member's responsibility for the integrity of the curriculum. The institution's responsibility for the data it holds and the people it serves. _These_ are what the policy commits to. A new model with a new capability does not change who is responsible.

The single sentence in the UNAK policy that does the most load-bearing work is in Annex A: **"The ultimate responsibility for the work remains with the human."** That sentence holds regardless of what the model does this month. It will still hold when the model does something none of us can anticipate.

On documentation specifically: you are correct that hand-written documentation cannot keep up with weekly model releases. Three pragmatic moves:

1. **Document patterns, not products.** A guide to "how to use prompts to extract structured data from a document" outlives any specific model. A guide to "how to use Copilot's December 2026 feature X" is dead by March.
2. **Let AI write the first draft of its own documentation, then a human edits.** This is what Joe suggested in his follow-up question and it is correct. The AI is faster at producing documentation than at being correctly documented. The human turn is editing for accuracy and political judgement.
3. **Accept that some documentation will always be out of date.** Mark documents with a date and a model version. Trust your users to read the date.

The deeper point: policy is not documentation. Policy sets the rules of the game. Documentation tells you how to play. The game changes; the rules can remain stable if they are written at the right level of abstraction.

---

## 14. "You focused on MS Copilot 365. Have you considered using multiple models to understand biases? I cross-examine LLMs with factual questions to expose biases. There are broad implications for how the human mind is trained by LLMs." — David M. Dozor

Yes — and the practice you describe (cross-examining the model with questions where you know the answer) is one of the highest-leverage techniques an operator can develop. I do it constantly.

For our **students**, Copilot 365 is the standard because it is the licensed enterprise tool with the appropriate data-handling contracts in place. That is a procurement decision, not an aesthetic one.

For **my own work** and for **BORG's development**, I run multiple frontier models in rotation. Claude (my primary), GPT-class models, Gemini for specific vision tasks, local models for low-stakes high-volume work. The biases are different and the differences are instructive. A few patterns I have noticed:

- **Bias in safety calibration is not bias in factuality.** A model that refuses to discuss a topic is not necessarily a model that gets the topic wrong. They are different failure modes and worth tracking separately.
- **Bias in tone and bias in content are independent.** A model can write in a confident voice and be wrong; a model can write hesitantly and be right. The fluency of the prose is the most dangerous signal because it correlates with reader trust but not with accuracy.
- **Cross-model agreement is a weak signal when the training data overlaps.** If three frontier models all say the same wrong thing, it is often because they all read the same wrong source.

On your deeper point — the implications for how the human mind is trained by long exposure to LLMs — I take this very seriously. My master's thesis was on AI governance and societal transformation under Giorgio Baruchello at UNAK. I called the central risk _cognitive debt_: the atrophy of human judgement under automation. The Bastani et al. (2025) deskilling finding is the empirical surface of it. The deeper version is a hypothesis worth investing in: that an entire generation of professionals will be _fluent_ in the AI register but progressively _less able_ to construct an argument from primary sources without it. I do not think we know yet whether that is true. I think it is the question worth funding to find out.

I would be glad to talk more — please reach out.

---

## 15. "AI capability to write documentation is pretty good. Ask the AI to look at your data and write documentation for the appropriate audience, then correct what is wrong." — Joe Montibello

You are right, and I addressed this in the answer to Question 13. Two cautions to layer on top of your suggestion:

1. **The first draft is fast; the editorial pass is slow.** The work of catching subtle wrongness in a fluent draft is harder than people expect. Budget for it.
2. **Documentation written from the system's own behaviour is documentation that describes the current state, not the intended state.** That is fine for "how do I use this", less fine for "what is this meant to do." Keep the intended-state documents handwritten.

Used carefully, AI-drafted documentation is a force multiplier. Used carelessly, it produces beautifully formatted, plausibly worded, technically wrong manuals.

---

## 16. "At what point do you think AI will move from an LLM to a BIO LLM module in a chip in one's body or brain?" — David Z. Cantu

The honest answer is: I do not know, and I am sceptical of anyone who tells you they do.

What I can say is what the question is really asking under the surface. You are asking when AI moves from an _external instrument_ to an _internal capability_. The brain-computer interface literature (Neuralink, Synchron, BrainGate) is making meaningful progress on motor and sensory restoration for people with specific neurological conditions. The gap between "decode a small set of intended movements from motor cortex" and "embed a language model in a chip that augments cognition" is enormous — not just engineering, but the entire question of what it would mean to have an LLM _in_ a brain rather than _consulted by_ a brain.

My professional opinion as someone who built infrastructure around current LLMs: external is going to be the dominant model for a long time, because external is _correctable_. You can replace the model, audit it, log its outputs, switch providers, unplug it. Embedded is none of those things. The governance problem becomes nearly intractable the moment the instrument is welded to the user.

The question I would rather work on, which you may also be asking: what is the _interface_ between human cognition and AI cognition going to look like in five years? My best guess: voice-first, ambient, context-aware, with a much smaller cognitive cost to invoke than typing into a chat window. Already arriving. Apple's and Microsoft's directions point this way.

The chip-in-the-brain version is a real research programme. It is also a much longer horizon than the public conversation suggests, and the governance questions it raises are why I would not bet on it being a near-term consumer reality.

---

## 17. "What are the best areas of study to get a well-rounded understanding of AI, Agentic? Thoughts on AOI (Artificial Organisational Intelligence)?" — David Z. Cantu

**For a well-rounded understanding of AI:**

Five reading directions, in priority order.

1. **The original technical papers.** Not summaries. Start with the _Attention is All You Need_ paper (Vaswani et al., 2017). Read OpenAI's GPT-3 paper. Read Anthropic's _Constitutional AI_ paper. Read whatever is currently the most-cited paper on retrieval-augmented generation. You do not need to follow every equation; you need to absorb the conceptual moves.
2. **Practical evaluation literature.** How do we know an AI system works? Read the HELM benchmark documentation. Read MMLU. Read the Bastani et al. (2025) _PNAS_ paper on deskilling. Read Magesh et al. (2025) on legal-AI hallucination. Evaluation is where most institutional decisions go wrong because nobody read this literature.
3. **A grounding in one adjacent discipline that pre-dates LLMs.** Cognitive science. Linguistics. Statistics. Information theory. Pick one. The danger of learning about AI from AI-only sources is that you inherit the field's blind spots.
4. **A grounding in governance and ethics.** Not the consulting-deck version. Read the EU AI Act in primary text. Read the NIST AI Risk Management Framework. Read Ruha Benjamin's _Race After Technology_. Read Helen Nissenbaum on contextual integrity. The technical questions and the governance questions are inseparable.
5. **Build something.** This is non-negotiable. A reading-only understanding of AI in 2026 is like a reading-only understanding of driving. Spin up an API key. Ship a small tool. Break it. Fix it. _Then_ go back to the papers and you will find they read differently.

**On AOI — Artificial Organisational Intelligence:**

This is not yet a settled term in the academic literature, but the _concept_ — that intelligence emerges at the organisational level, not just the individual level, and that AI can participate in that organisational intelligence — is a serious one and worth taking seriously.

The closest established literatures are:

- **Organisational learning** (Argyris, Schön, Senge)
- **Distributed cognition** (Hutchins, _Cognition in the Wild_)
- **Collective intelligence** (Malone, Lakhani, the MIT Center for Collective Intelligence)
- **The "human-AI team" literature**, which is emerging fast in the operations and management journals

What I would say from the operator's chair: organisations are already starting to behave as if they have an extended cognitive surface that includes AI. The institutional question is whether that extension is _governed_ — through policy, infrastructure, and the irreducible human in the loop — or _ungoverned_, in which case the organisation has effectively outsourced part of its judgement to a vendor without realising it.

The framing of AOI is useful insofar as it makes that question visible. It is dangerous insofar as it can be used to dress up "we bought a chatbot" as "we deployed organisational intelligence." Watch the framing closely.

---

## Closing note

Seventeen questions in sixty minutes is a hot room. Thank you to everyone who asked, and to Julia Heesen for moderating it with such consistency. The pattern across the seventeen — and this is what I tried to say in the blog post that accompanies this Q&A — is that the questions were the inverse of the talk. The room kept asking what to buy, and every honest answer pointed back to the operator who cannot be issued.

The acceleration is real. The irreducible human stays human.

---

_Magnús Smári Smárason · smarason.is · 2026_
_For citations and full evidence, see `REPORT_5page_Synthesis.md` and `Research_GenAI_SkillAcquisition/results/SYNTHESIS.md` in the project repository._