A complete methodology for building institutional knowledge graphs using Neo4j, JSON-LD, and SHACL validation — deployed at a small university with 1,039 nodes across 16 domains.
TL;DR / Executive Summary
Small and medium-sized organisations run on invisible architecture: governance, systems, policies, staff, and interdependencies that exist mostly in people’s heads and scattered documents. The result is fragility when staff leave, endless cross-domain questions that no one can answer quickly, and AI tools that remain blind to the institution they serve.
This guide presents a complete, reproducible methodology for solving that problem with a production institutional knowledge graph. Built and deployed at the University of Akureyri (northern Iceland, ~2,800 students, ~300 staff) by a single practitioner, the graph now contains 1,039 nodes across 16 domains, 1,648 relationships of 30 types, and is kept live and accurate through a bidirectional pipeline where Neo4j is the single source of truth.
Key innovations that make it practical for small teams:
The full paper includes architecture diagrams, every script size, exact quality metrics, Cypher query library, chatbot evaluation results, and an 11-step reproducibility guide so any small organisation or higher-ed institution can copy the approach in weeks, not years.
Small and medium-sized organisations face a structural knowledge problem: institutional knowledge about systems, governance, staff, policies, and dependencies lives in people's heads, scattered documents, and undocumented assumptions. This paper presents a complete, reproducible methodology for building an institutional knowledge graph using Neo4j, JSON-LD, and SHACL validation. The approach was developed and deployed at a small university (~2,800 students, ~300 staff) in northern Iceland, producing a graph of over 1,000 nodes across 16 domains with 1,648 relationships and 30 relationship types — maintained by a single practitioner. The pipeline was constructed using AI-assisted development (Claude Code within a structured orchestration framework), following a layered domain-by-domain strategy with validation gates between layers — blending a deterministic, repeatable tech stack with generative AI that accelerates code production without participating in runtime execution. The graph powers two production chatbots evaluated against a 410-question golden dataset with LLM-as-judge scoring. The paper describes architecture, pipeline design, data modelling, validation, a bidirectional pipeline where Neo4j serves as the single source of truth, the AI-assisted construction methodology, and a systematic evaluation framework. Every design pattern is presented with sufficient detail for practitioners at other organisations to reproduce the approach.
Keywords: knowledge graph, Neo4j, institutional architecture, JSON-LD, SHACL, organisational modelling, graph database, higher education, AI-assisted development, Claude Code
Every organisation has an architecture — not just an IT architecture, but a living mesh of governance structures, academic programmes, policy frameworks, vendor relationships, identity systems, and human expertise. In small organisations, this architecture is often undocumented. The institution functions because a handful of people carry the picture in their heads.
This creates three problems:
This paper presents a complete methodology for addressing these problems using a graph database approach. After extensive experimentation with Neo4j for mapping complex organisational architectures and interdependencies, this approach represents what has proven effective in this implementation: a combination of graph modelling, semantic web standards, and automated pipeline design that produces a living, queryable, validated representation of an entire institution.
Organisational relationships are inherently graph-shaped — hierarchies, dependencies, memberships, and integrations — and a property graph model captures these naturally. While many aspects can be modelled in relational systems, graph traversal makes multi-hop queries (e.g. dependency chains) significantly simpler and more expressive.
The methodology was developed at the University of Akureyri (UNAK), a small public university in northern Iceland. The resulting knowledge graph contains:
The contributions of this paper are:
Institutional knowledge management in higher education has traditionally relied on enterprise architecture frameworks (TOGAF, ArchiMate) or purpose-built information systems. These approaches tend toward heavyweight tooling and consultant-driven processes that are poorly suited to small organisations with limited IT staffing.
Graph-based approaches to organisational modelling have gained traction with the maturation of property graph databases. Neo4j's Cypher query language and MERGE-based idempotent ingestion make it particularly suitable for iteratively building institutional graphs. The W3C's JSON-LD and SHACL standards provide a standards-compliant foundation for data modelling and validation without requiring full RDF/SPARQL infrastructure.
This work differs from existing approaches in three ways: (1) it targets small organisations where a single practitioner must build and maintain the entire graph, (2) it uses AI-assisted development to accelerate pipeline construction while relying on human domain expertise for knowledge curation, and (3) it implements a bidirectional pipeline where the graph database — not the source files — is the authoritative source of truth.
The knowledge graph uses JSON-LD as its serialisation format, leveraging schema.org vocabulary extended with institution-specific types. Every entity is a node with a stable identifier, a type, and a set of properties. Relationships are encoded as JSON-LD references with optional inline properties.
All nodes carry the Entity label in Neo4j plus type-specific labels derived from @type in JSON-LD. Every node receives four provenance fields at ingest time:
| Field | Purpose |
|---|---|
| `source_system` | Origin of the data (e.g., `seed`, `scraper`, `manual`) |
| `source_record_id` | Original identifier in the source system |
| `observed_at` | Timestamp when the data was last known to be accurate |
| `ingested_at` | Timestamp when the data entered Neo4j |
This provenance model ensures every fact in the graph is traceable to its origin and can be assessed for freshness.
Stable identifiers are the backbone of the graph. The scheme uses namespaced prefixes to avoid collisions:
| Category | Pattern | Example |
|---|---|---|
| Entities | `unak:<slug>` | `unak:Canvas`, `unak:FacultyEducation` |
| Staff | `unak:staff:<slug>` | `unak:staff:firstname-lastname` |
| Relationships | `rel:<TYPE>:<from>:<to>:<source>` | `rel:INTEGRATES_WITH:unak:Canvas:unak:Panopto:seed` |
| Observations | `obs:<metric>:<entity>:<timestamp>` | `obs:uptime90d:unak:Canvas:2026-02-23T00:00:00Z` |
Once assigned, an ID never changes. If a role holder changes, the node stays — only the properties update. This stability is what makes cross-file references work and what makes the pipeline safely re-runnable.
The single most important architectural decision was splitting the graph into 16 domain-specific JSON-LD files rather than maintaining a monolithic data file. Each file carries its own full @context block and can be edited, validated, and reasoned about independently.
| # | Domain | Nodes | Contents |
|---|---|---|---|
| 01 | Organisational Structure | 33 | Institutional hierarchy — schools, faculties, centres, offices |
| 02 | IT Stack | 59 | All managed systems — LMS, portals, collaboration tools, AI systems |
| 03 | Identity & Auth | 5 | Identity providers, authentication federation |
| 04 | External Vendors | 64 | Cloud providers, AI vendors, library database vendors |
| 05 | Observations | 13 | Uptime metrics, student enrolment data |
| 06 | Governance | 26 | Leadership, committees, council, student association, legal framework |
| 07 | Academic Programmes | 48 | Graduate and undergraduate programmes across faculties |
| 08 | Policies & Legal | 26 | Institutional policies, AI policy, national legislation |
| 09 | Partnerships & Networks | 39 | Exchange programmes, research networks, partner institutions |
| 10 | Facilities & Services | 17 | Campus buildings, library, student services |
| 11 | Staff Directory | ~291 | All staff with role, contact, and department assignment |
| 12 | Extended Governance | 82 | Additional committees, working groups, quality bodies |
| 13 | Library Databases | 97 | Electronic research databases with vendor and access classification |
| 14 | Internal Handbooks (Bridge) | 5 | Bridge nodes to RAG content in vector storage |
| 15 | Public FAQ (Bridge) | 3 | Bridge nodes to FAQ content in vector storage |
| 16 | Handbook Entities | 20 | Entities extracted from internal handbook content |
This decomposition yields four critical benefits:
Domain expertise stays local. The person who understands IT systems edits file 02. The person who understands governance edits file 06. Neither needs to understand the other 15 files.
Merge conflicts become manageable. Two people editing different domain files never conflict. Even within the same file, JSON-LD's flat graph structure means additions rarely collide.
Validation catches cross-domain errors. When files are merged, referential integrity checking verifies that every @id reference resolves to a node that exists — across all 16 files. A reference from file 07 (Academic Programmes) to a faculty defined in file 01 (Organisational Structure) is validated at merge time.
The monolith is a derived artefact. The merged output is generated by concatenating all 16 files' @graph arrays. It is never edited directly. The source of truth is always the domain files (or, in the current architecture, the Neo4j graph from which they are exported).
Domains are both a modelling and organisational construct. Each domain maps to a source file and reflects a distinct area of institutional knowledge (e.g. IT systems, governance, academic programmes).
In the current implementation, domain classification is derived deterministically during ingestion using ID prefixes, categories, and routing rules. All entities now carry a domain property, enabling explicit cross-domain analysis (e.g. measuring relationships between governance, systems, and organisational units). This allows the graph to be modular at the file level while remaining fully connected and queryable as a unified structure.
The pipeline has evolved from a linear flow (files → Neo4j) to a bidirectional architecture where Neo4j is the single source of truth and domain files are version-controlled exports:
┌──────────────────────────┐
│ Neo4j (Source of Truth) │
└──────┬───────────┬────────┘
│ │
Export ↓ ↑ Ingest
│ │
┌──────┴───────────┴────────┐
│ 16 Domain JSON-LD Files │
│ (Version-Controlled) │
└──────┬───────────┬────────┘
│ │
Merge ↓ ↑ Scrape
│ │
┌──────┴───┐ ┌────┴────────┐
│ Merged │ │ Web Sources │
│ Artefact │ │ (unak.is) │
└──────┬───┘ └─────────────┘
│
Validate ↓
│
┌──────┴──────────────────┐
│ SHACL + Referential │
│ Integrity Gate │
└─────────────────────────┘The export pipeline (export_neo4j.py) reads all nodes and relationships from Neo4j and routes them deterministically back into the 16 domain files using ID prefix, source_system, and category as routing keys. This means:
The one-command orchestrator supports all pipeline directions:
./update.sh # merge + validate only ./update.sh --scrape # scrape web sources + merge + validate ./update.sh --ingest # merge + validate + ingest into production ./update.sh --export # export Neo4j → domain files, then merge + validate ./update.sh --quality # merge + validate + quality inspection
The graph models 30 relationship types, each carrying provenance fields plus optional domain-specific properties:
| Relationship | Semantics | Example |
|---|---|---|
| `CONTAINS` | Hierarchy | School contains Faculty |
| `MANAGES` | Operational responsibility | IT Department manages LMS |
| `HOSTED_BY` | Infrastructure hosting | LMS hosted by SaaS Vendor |
| `INTEGRATES_WITH` | System integration | LMS integrates with Video Platform (protocol: LTI 1.3) |
| `DEPENDS_ON` | System dependency | Portal depends on Identity Provider (risk_impact: 5) |
| `AUTHENTICATES_VIA` | Auth dependency | LMS authenticates via Azure AD (risk_impact: 4) |
| `WORKS_FOR` | Employment | Staff member works for Institution |
| `MEMBER_OF` | Membership | Staff member is member of Committee |
| `PARTNERS_WITH` | Partnership | Institution partners with Partner University |
| `REGULATED_BY` | Legal compliance | Institution regulated by Higher Education Act |
| `GOVERNED_BY` | Governance | Faculty governed by Dean |
| `SUBJECT_TO` | Policy application | Department subject to Quality Policy |
| `IMPLEMENTS` | Policy implementation | Office implements Data Protection Policy |
| `DEVELOPED_BY` | Software development | System developed by Staff Member |
| `OWNS` | System ownership | Department owns Internal System |
| `PART_OF` | Compositional membership | Sub-unit part of Larger Unit |
| ... | ... | _plus 14 additional types_ |
Some relationships carry domain-specific properties. INTEGRATES_WITH edges include the integration protocol (e.g., LTI 1.3). DEPENDS_ON and AUTHENTICATES_VIA edges carry a risk_impact score from 1 to 5. This enables queries like: _"Show all systems with a critical dependency (risk_impact ≥ 4) on the identity provider"_ — answered instantly by graph traversal.
All relationships carry a confidence score (default 1.0 for curated data), enabling future differentiation between verified and inferred knowledge. Systems are also annotated with a criticality field, allowing the graph to support risk-aware queries once values are enriched.
Time-varying metrics are not stored as node properties. They are materialised as separate Observation nodes linked via OBSERVED_FOR:
(obs:Observation {metric: "uptime90d", value: 99.7, observed_at: "2026-02-23"})
-[:OBSERVED_FOR]->
(lms:Entity {name: "Canvas LMS"})This preserves history. Re-ingesting with a new observed_at timestamp creates a new observation node; the old one persists. Over time, this builds a time series without overwriting anything and without requiring a separate time-series database.
The same pattern works for enrolment data, budget metrics, or any institutional measure that changes over time.
Two automated scrapers extract data from publicly available web sources:
Staff Directory Scraper (530 lines of Python) — A three-pass approach:
The three-pass approach achieves 94.2% department mapping coverage (274 out of 291 staff members). Pass 2 takes approximately 6 minutes for ~291 pages due to rate-limiting. Flags allow skipping individual passes for faster iteration (--skip-details, --skip-faculty).
Library Database Scraper (360 lines) — Fetches the electronic research database catalogue. Classifies each database by vendor and access type (subscription, open access, national access). Generates 97 nodes in approximately 10 seconds.
IT Service Audit (435 lines) — Crawls institutional web properties and identifies third-party services from HTML source analysis (external scripts, iframes, meta tags), inline script pattern matching, and page keyword detection. This is a discovery tool — findings require manual review before adding to the graph. The most recent audit added 32 nodes (19 IT systems, 13 vendors).
The merge script (47 lines of Python) reads all 16 domain files, concatenates their @graph arrays while preserving the shared @context block, and writes a single output file. Execution takes less than a second. The merged artefact is used by downstream validation and ingestion steps.
Two validation passes run in sequence before any data reaches Neo4j:
Referential integrity: Every @id reference in the merged graph is checked against the set of defined node IDs. If file 07 references a faculty defined in file 01 and that faculty has been renamed or removed, this check catches it. This is the most common error when editing the graph manually.
SHACL validation: The merged JSON-LD is converted to an RDF triple store using rdflib, then validated against 16 shape definitions written in Turtle (W3C SHACL standard). Shapes enforce structural contracts:
unak:ServiceShape a sh:NodeShape ;
sh:targetClass schema:Service ;
sh:property [
sh:path schema:name ;
sh:minCount 1 ;
] ;
sh:property [
sh:path unak:category ;
sh:minCount 1 ;
] .Every node type has a corresponding shape. If validation fails, the pipeline halts. No corrupted data reaches Neo4j.
A two-pass ingestion into Neo4j using the official neo4j Python driver:
Entity label plus type-specific labels. Properties are set from the JSON-LD, plus the four provenance fields.The MERGE keyword in Cypher is critical — it makes the entire pipeline idempotent. Running the pipeline ten times produces the same graph state. Nodes matched by @id get updated; new nodes get created; nothing gets duplicated. This property is critical in this context for a pipeline that must be safely re-runnable.
The export script (387 lines) reads all nodes and relationships from Neo4j and writes them back into the 16 domain JSON-LD files. Routing is deterministic:
unak:staff:*) → file 11, observations (obs:*) → file 05category property to the appropriate domain fileThe export round-trips cleanly: export → ingest → re-export produces identical output. This was verified as part of the pipeline validation.
The entire knowledge graph pipeline — approximately 4,800 lines of Python, 200 lines of SHACL, 120 lines of Bash — was constructed using an AI-assisted development methodology where a practical observation is that AI-assisted development significantly accelerates pipeline construction. The approach is not "AI wrote the code"; it is a structured collaboration pattern between a human domain expert (the orchestrator) and an AI coding agent operating within a purpose-built framework.
The primary development tool was Claude Code, Anthropic's CLI-based coding agent. Claude Code operates directly in the terminal — reading files, writing code, running commands, and iterating on results within the developer's actual working environment. Unlike chat-based AI assistants, it has full context of the codebase and can execute multi-step tasks autonomously.
The critical accelerator was not the AI agent alone, but wrapping it in a structured orchestration framework. In this case, the framework was PAI (Personal AI Infrastructure, an open-source project by Daniel Miessler), but the methodology is reproducible with any equivalent system that provides:
The key insight is that an unstructured conversation with an AI produces scattered results. A structured framework that enforces observe-think-plan-build-verify cycles produces consistent, validated output. Any orchestration layer that provides these four capabilities would serve the same function.
The knowledge graph was not built all at once. It was constructed through a deliberate layered strategy, where each domain was added incrementally, validated, and stabilised before the next layer was introduced:
Layer 1: Organisational Structure (33 nodes)
↓ validate, ingest, verify
Layer 2: IT Stack + Identity (64 nodes)
↓ validate, ingest, verify
Layer 3: Governance + Policies (52 nodes)
↓ validate, ingest, verify cross-domain references
Layer 4: Academic Programmes + Partnerships (87 nodes)
↓ validate, ingest, verify
Layer 5: Facilities + Services (17 nodes)
↓ validate, ingest, verify
Layer 6: Staff Directory (291 nodes, scraped)
↓ validate, ingest, verify department mapping
Layer 7: Library Databases (97 nodes, scraped)
↓ validate, ingest, verify vendor relationships
Layer 8: Extended Governance + Handbook Entities (102 nodes)
↓ validate, ingest, verify
Layer 9: Quality gap closure (iterative rounds to 0/0/0)
↓ export, validate round-trip fidelity
Layer 10: Bidirectional pipeline (export capability)Each layer followed the same cycle:
This layered approach meant that at every stage, the graph was in a valid, consistent state. No layer was added that broke a previous layer. The validation gate enforced this structurally, not just as a convention.
A critical design principle was the strict separation between deterministic infrastructure and generative AI work:
| Component | Nature | Role of AI |
|---|---|---|
| JSON-LD schema | Deterministic | AI writes initial schema; human validates and evolves it |
| SHACL shapes | Deterministic | AI generates shape definitions; human reviews against domain knowledge |
| Pipeline scripts | Deterministic | AI writes Python; human reviews logic and edge cases |
| Neo4j MERGE queries | Deterministic | AI generates Cypher; idempotency is verified by re-running |
| Validation | Deterministic | Automated — no AI in the loop at validation time |
| Domain knowledge curation | Generative | AI assists with research and extraction; human makes all modelling decisions |
| Scraper construction | Generative → Deterministic | AI builds scraper code; output is deterministic once written |
| Quality gap analysis | Generative | AI identifies patterns; human decides which gaps are real |
The key insight: the pipeline itself is entirely deterministic. Once written, merge.py, validate.py, ingest_neo4j.py, and export_neo4j.py produce identical results regardless of whether an AI or a human runs them. AI was used to _construct_ the deterministic infrastructure, not to _operate_ it. This means the pipeline is auditable, reproducible, and does not depend on AI availability at runtime.
AI-assisted development excelled at:
AI-assisted development struggled with or was deliberately excluded from:
The most valuable aspect of the structured orchestration approach was its concept of Ideal State Criteria (ISC) — concrete, testable conditions defined before work begins and verified with evidence when work ends. This mapped directly onto the knowledge graph's quality metrics:
| ISC Criterion | Quality Metric | Final State |
|---|---|---|
| "Every IT system has an owning department" | Ownership gaps | 0 (down from 109) |
| "Every organisational unit subject to a policy" | Policy gaps | 0 (down from ~20) |
| "Every staff member mapped to a department" | Staff mapping gaps | 94.2% coverage |
| "Zero unresolved cross-file references" | Referential integrity | PASS |
| "All node types have SHACL shapes" | Schema coverage | 16/16 shapes |
Each domain layer was treated as a task with its own ISC set: define criteria, build, validate against criteria, iterate until all pass. The orchestration framework enforced this discipline — no layer was declared complete without evidence for every criterion. Custom skills within the framework handled specific tasks: web research for discovering institutional data, browser automation for verifying scraper output against live pages, and Neo4j-specific skills for Cypher queries and graph validation.
For practitioners wanting to replicate this methodology:
A modular quality inspection system runs automated checks against the graph data:
| Module | Purpose |
|---|---|
| **Inefficiency** | Dead-end nodes, hubs, sparse nodes, disconnected entities |
| **Redundancy** | Fuzzy duplicate detection, cross-file overlaps, redundant relationships |
| **Friction** | Policy gaps, ownership gaps, staff coverage gaps |
| **Cross-validation** | Staff delta, IT coverage, empty departments, stale sources |
| **AI Analysis** | LLM-powered analysis of findings (optional, requires API key) |
The quality inspection modules produce concrete gap lists that are systematically closed in iterative rounds:
Ownership gaps: Every IT system, service, and organisational unit must have at least one managing or owning entity. The IT department was established as the default owner for all IT systems without explicit ownership. Starting from 109 ownership gaps, systematic assignment reduced this to zero.
Subsequent health dashboard auditing identified 27 additional IT-audit-discovered systems without explicit ownership, which are now tracked as remediation items.
Policy gaps: Every organisational unit must be subject to at least one institutional policy. Faculties were linked to relevant legislation; offices were linked to operational policies. Starting from approximately 20 policy gaps, systematic linking reduced this to zero.
Staff mapping gaps: Every staff member should be linked to a department. The three-pass scraper resolved most cases; remaining gaps were manually mapped using faculty staff pages. Starting from 18 unmapped staff, coverage reached 94.2% (remaining cases identified but not yet fully modelled).
Duplicate resolution: Fuzzy string matching identified 43 potential duplicates. Three were confirmed true duplicates (same entity with different IDs, namespace collisions, name variants) and merged. The remaining 40 were confirmed as false positives — a common occurrence with Icelandic patronymic naming patterns where unrelated individuals share similar names.
Remaining duplicate name groups represent legitimate vendor/product overlaps and are disambiguated using display names rather than merged, preserving semantic distinctions while avoiding ambiguity in user-facing contexts.
| Metric | Value |
|---|---|
| Policy gaps | 0 |
| Ownership gaps | 0 |
| Staff mapping gaps | 0 (94.2% coverage) |
| SHACL validation | PASS |
| Referential integrity | PASS |
| Confirmed duplicates | 0 remaining |
| Dead-end nodes | 105 (structurally expected — vendors, programmes with only inbound edges) |
A health dashboard now runs after each pipeline execution, auditing structural and semantic integrity across the graph. This includes checks for orphan nodes, missing provenance, domain coverage, duplicate entities, and ownership gaps.
This shifts validation from a one-time gate to continuous monitoring — the graph is treated as a living system with measurable quality metrics rather than a static artefact.
The system now distinguishes between resolved gaps and identified gaps. For example, 27 IT systems currently lack explicit ownership — these are not hidden but surfaced by the health dashboard as actionable items. A graph that exposes its own incompleteness is more valuable than one that appears artificially complete.
| Metric | Value |
|---|---|
| Nodes | 729 |
| Relationships | 1,602 |
| Orphan nodes | 0 |
| Missing domain | 0 |
| Missing provenance | 0 |
| Cross-domain edges | 2,738 |
| Duplicate name groups | 4 (disambiguated) |
| Systems without owner | 27 (identified for remediation) |
The knowledge graph is integrated into Borg, an institutional AI platform built for the university. Borg is a broader system encompassing AI assistants, knowledge management, and institutional tooling — a full treatment of Borg is beyond the scope of this paper, but its knowledge graph features illustrate how a well-structured graph becomes immediately useful once embedded in a platform.
Borg exposes the knowledge graph through two interactive force-directed graph diagrams:
Both diagrams render the live Neo4j graph in real time — they are not static snapshots but direct queries against the production database.
A library of Cypher queries was developed alongside the graph for testing, validation, and operational analysis. These queries serve as both a verification tool during development and a practical analysis resource for ongoing operations:
-- Blast radius: what breaks if Azure AD goes down?
MATCH (aad:Entity {id: "unak:AzureAD"})<-[r:AUTHENTICATES_VIA|DEPENDS_ON]-(dep:Entity)
WHERE r.risk_impact >= 4
RETURN dep.name, type(r), r.risk_impact ORDER BY r.risk_impact DESC
-- Vendor concentration: which vendors host the most systems?
MATCH (v:Entity)<-[:HOSTED_BY]-(s:Entity)
RETURN v.name, count(s) AS systems ORDER BY systems DESC
-- Orphan check: zero-relationship nodes (should be 0)
MATCH (n:Entity) WHERE NOT (n)--() RETURN count(n) AS orphans
-- Staff coverage: unmapped staff members
MATCH (p:Entity)-[:WORKS_FOR]->(u:Entity {id: "unak:UNAK"})
WHERE NOT (p)-[:MEMBER_OF]->() AND p.id STARTS WITH "unak:staff:"
RETURN p.name, p.jobTitle_isThese queries were used iteratively during construction to verify each layer. The orphan check, for example, runs after every ingestion to enforce the zero-orphan invariant. The blast radius and vendor concentration queries feed directly into IT operations discussions.
A standalone Cypher audit dump (generate_cypher.py) exports the entire graph as MERGE statements, providing a portable, human-readable representation of the full graph state that can be reviewed in code review, diffed between versions, or used to recreate the graph from scratch without the Python pipeline.
The admin interface includes a dedicated chatbot with direct access to the knowledge graph via Neo4j tool-use (detailed in Section 8). This chatbot operates as a working tool for graph maintenance — administrators can ask natural-language questions about the graph ("which IT systems have no owner?", "show me all committees and their members") and receive answers grounded in live graph traversal, not static documentation.
The graph enables operational questions that previously required consulting multiple people:
AUTHENTICATES_VIA and DEPENDS_ON edges with risk_impact ≥ 4.HOSTED_BY and PROVIDES edges per vendor.INTEGRATES_WITH edges with protocol properties.One immediate outcome was identifying critical identity dependencies that were not fully documented. The graph revealed that multiple core systems depended on Azure AD with high risk impact, prompting a discussion on redundancy and incident planning.
The graph links every organisational unit to its governing bodies, applicable policies, and relevant legislation. This supports:
Bridge nodes connect the structured graph to unstructured content stored in a vector database (handbook text, FAQ content). This is the foundation for GraphRAG — using graph structure to improve retrieval-augmented generation. The graph provides _what_ things are and how they relate; the vector store provides the full narrative text.
The knowledge graph does not replace retrieval-augmented generation systems. Instead, it complements them: the graph provides structured relationships and constraints, while vector search provides unstructured narrative context. Together, they enable both precise queries (e.g. dependency chains) and explanatory responses grounded in documentation.
The knowledge graph's primary consumer-facing application is a pair of production chatbots that combine structured graph data with unstructured content retrieval. Building the chatbots surfaced a question that every knowledge graph project must eventually answer: _how do you know the graph is actually useful?_ The answer is systematic evaluation.
The institution runs two chatbots with fundamentally different retrieval strategies, both drawing on the knowledge graph:
GraphRAG Agent (UNAK-spjall) — An agentic chatbot powered by Gemini with tool-use access to Neo4j. Rather than embedding graph data into a prompt, the agent decides at inference time which graph tools to invoke:
| Tool | Function |
|---|---|
| `resolveEntities` | Entity lookup by name (fuzzy matching) |
| `searchDocuments` | Full-text + semantic search with category filtering |
| `getEntityDetails` | Relationship traversal for a specific entity |
| `exploreRelationships` | Multi-hop graph traversal |
| `getOrganizationStructure` | Organisational hierarchy queries |
This agentic approach means the chatbot can compose multi-step queries: resolve an entity by name, traverse its relationships, then search for related documents — all within a single conversational turn. The graph provides structured facts; the vector store (pgvector with 1,536-dimensional Gemini embeddings) provides narrative context from handbooks, FAQs, and policy documents.
RAG Agent (Gervigreindur) — An AI literacy tutor that uses semantic search over chunked handbook content in pgvector. It does not query Neo4j directly but benefits from knowledge graph data that has been exported to the vector store as structured markdown chunks. This represents a simpler but effective pattern: the graph as a content source for vector retrieval.
To measure whether the knowledge graph actually improves chatbot responses, a systematic evaluation framework was built around a golden dataset of 410 questions:
| Dimension | Distribution |
|---|---|
| **Question type** | 45% factual, 25% procedural, 15% synthesis, 10% edge case, 5% out-of-scope |
| **Language** | 50% English, 40% Icelandic, 10% bilingual |
| **Coverage** | 15 user stories spanning all 16 knowledge domains |
The dataset includes golden questions for both chatbots:
Responses are scored by an LLM judge (Claude) across five dimensions:
| Dimension | Weight | What It Measures |
|---|---|---|
| Correctness | 30% | Factual accuracy against known ground truth |
| Completeness | 25% | Whether the response addresses all aspects of the question |
| Faithfulness | 20% | Whether claims are grounded in retrieved context (no hallucination) |
| Language | 15% | Appropriate language use (Icelandic/English matching, terminology) |
| Referral | 10% | Appropriate escalation to human support when the question exceeds scope |
Scoring thresholds:
| Score | Interpretation |
|---|---|
| ≥ 0.85 | Target — production-ready |
| ≥ 0.75 | Pass — acceptable for deployment |
| < 0.72 | Block — deployment gate fails |
Initial evaluation (March 2026) produced baseline scores:
| Chatbot | Composite Score | Key Finding |
|---|---|---|
| UNAK-spjall | 2.09/3.00 (70.4%) | Critical UTF-8 encoding bug depressed all Icelandic-language responses; graph traversal otherwise effective |
| Gervigreindur | 2.62/3.00 (90.0%) | Strong performance; minor mismatch between pedagogy-focused responses and judge weight distribution |
The evaluation immediately surfaced actionable issues:
While LLM-as-judge provides scalable evaluation across large datasets, it is complemented by manual inspection of failure cases and targeted user feedback during development.
The most valuable outcome of systematic evaluation was establishing a closed feedback loop between chatbot performance and graph quality:
Golden Questions → Chatbot Response → LLM Judge Score
↑ │
│ ↓
Graph enrichment ← Gap analysis ← Low-scoring domainsWhen the judge scores a response low on completeness for a governance question, root cause analysis traces back to either (a) missing graph nodes, (b) missing relationships, or (c) missing vector content. Each maps to a specific fix: add nodes to the domain file, add edges, or enrich the handbook content. After the fix, the golden question is re-evaluated to verify improvement.
This means the evaluation dataset is not just a quality gate — it is a requirements specification for the knowledge graph itself. Questions the chatbot cannot answer well become ISC criteria for graph enrichment.
JSON-LD provides linked data semantics (stable URIs, type systems, vocabulary reuse from schema.org) with the developer experience of plain JSON. Domain experts can read and edit JSON-LD in any text editor without learning RDF serialisation formats or SPARQL.
The @context block maps human-readable property names to schema.org URIs. The @graph array holds all nodes. Each node has an @id and @type. This is the entire model.
Custom Python validators would have been faster to write initially. SHACL shapes are declarative, standards-compliant, separate from pipeline code, and extensible by anyone who can write Turtle. A new shape can be added without touching Python.
Idempotent ingestion via MERGE means the pipeline is always safe to re-run. This eliminates an entire class of operational anxiety. Any domain file edit — from a single property change to a full domain rewrite — can be ingested without concern for duplication.
The initial architecture treated domain files as the source of truth with Neo4j as a downstream consumer. This was reversed: Neo4j is now authoritative, and domain files are derived exports. This change was motivated by:
Every node and relationship carries source_system, source_record_id, observed_at, and ingested_at. This is critical in this context for institutional infrastructure. The question "where did this data come from and when was it last verified?" must always be answerable.
Time-varying data stored as separate Observation nodes preserves history across pipeline runs. This is simpler than maintaining a separate time-series database and supports the same analytical queries.
These lessons are distilled from building and operating the graph over approximately eight weeks:
1. Start with structure, not scale. Thirty-three organisational nodes with clean relationships are more valuable than 10,000 poorly connected records. The graph's value comes from the connections, not the node count.
2. The domain-file decomposition pattern works. Splitting a graph into domain-specific source files with cross-file referential integrity checking is the single best decision for maintainability. When something breaks, you know exactly which file to inspect.
3. Validate before you ingest. SHACL validation as a gate before Neo4j ingestion catches errors that would silently corrupt the graph. Every domain file addition goes through the same gate. No exceptions.
4. Scrape what's public, curate what's not. Staff directories and library databases are public and scrapable. Governance structures, policy details, and system dependencies require human curation. Recognise which is which and do not attempt to automate the curation.
5. Idempotency is critical in this context. If the pipeline is not safe to re-run, operators will be afraid to run it. MERGE-based ingestion means the full pipeline can execute on any change, at any time, with confidence.
6. AI accelerates the code, not the knowledge. A structured AI agent operating within an orchestration framework (in this case, Claude Code with PAI*) can produce the pipeline scripts, scrapers, validation logic, and quality inspection modules at remarkable speed. But the knowledge — which entities to model, what relationships matter, what level of granularity serves downstream use cases — requires understanding the institution. The division of labour is clear: AI writes the deterministic infrastructure, the human makes every modelling decision. Neither can do the other's job effectively.
7. Small institutions have an advantage. At a university of ~300 staff, the entire institution can be modelled in approximately 1,000 nodes. Every node can be personally validated. Every relationship can be checked against lived experience. This is a feature of small scale, not a limitation.
8. Make the graph bidirectional early. A unidirectional pipeline (files → database) creates a sync problem as soon as anyone edits the database directly. Building export capability from the start keeps files and database in agreement and provides version-controlled disaster recovery for free.
9. Make gaps visible, then close them systematically. Ownership, policy, and staffing gaps are not cosmetic — they represent blind spots in the institutional model. Track them as metrics and drive them to zero through iterative rounds.
Ongoing maintenance currently requires approximately a few hours per week, primarily for reviewing scraper output, resolving validation issues, and incorporating organisational changes.
10. Neo4j is exceptionally well-suited to this problem. Organisational architectures are inherently graph-shaped. The property graph model captures hierarchies, dependencies, memberships, and integrations naturally. Cypher queries for multi-hop traversal (dependency chains, blast radius analysis) are trivial to write and fast to execute. For anyone considering a graph database for organisational mapping, Neo4j is an excellent choice.
This approach is not universally optimal.
High-churn environments require continuous maintenance; without clear ownership, the graph can degrade quickly as organisational reality changes.
Some domains — particularly narrative-heavy documentation — are better served by vector retrieval than structured modelling. The knowledge graph is most valuable where relationships and dependencies are first-class concerns.
Over-modelling is a real risk. Not every relationship needs to be explicitly represented, and excessive granularity can reduce clarity rather than improve it.
Finally, governance and organisational reality do not always map cleanly to formal structures. Ownership, responsibility, and decision-making authority may be ambiguous or politically sensitive.
The key shift is treating the knowledge graph not as a static artefact, but as a continuously audited system with explicit quality metrics and feedback loops.
For practitioners wanting to replicate this approach at their own organisation:
Identify 8–20 knowledge domains that cover the institution. Common starting points:
yourorg:) for all entity IDsThe minimum viable pipeline consists of four scripts:
| Script | Purpose | Approximate Size |
|---|---|---|
| `merge.py` | Concatenate domain files | ~50 lines |
| `validate.py` | Referential integrity + SHACL | ~130 lines |
| `ingest.py` | MERGE into Neo4j | ~300 lines |
| `export.py` | Export Neo4j → domain files | ~400 lines |
Begin with organisational structure (one file, ~30 nodes). Add domains iteratively. Run validation after every change. Do not wait until the graph is "complete" to start validating — completeness is a moving target; correctness is not.
Identify publicly available data sources (staff directories, catalogues, service listings). Build scrapers as needed. Each scraper writes to a single domain file.
Define what "complete" means for your graph:
Track these metrics and drive them to zero, that is my target.

I built four DevOps prompts for AI coding agents and ran them against my own portfolio site. The results were honest, useful, and occasionally humbling.

Generative AI is not something we glide smoothly back into our previous normal lives with. Things that used to be hard simply aren't anymore.

The Sjalli-Kiss is a symbol of the freedom to make mistakes. In an age of surveillance culture and AI — are we truly living?