Model documents with AI (agent prompt)

You do not need to be an ontologist to use CAC Ontology.

If you can read an information resource (press release, report, policy memo, tool export, etc.), you can use the CAC Ontology workflow to translate it into:

a connected graph in Turtle (.ttl)
investigator-focused queries in SPARQL (.rq)
and a set of artifacts that show how the translation was done (provenance + evidence pointers)

This workflow is guided by a structured prompt (the “CAC Ontology Enhancement Agent Prompt”) maintained in the main CAC Ontology repository:

Canonical prompt: https://github.com/Project-VIC-International/CAC-Ontology/blob/main/agent.md
Latest changes: https://github.com/Project-VIC-International/CAC-Ontology/blob/main/CHANGELOG.md

The plain-language idea

CAC Ontology is a way to write the language we already use in investigations as a graph:

Namespaces: topic-specific vocabularies (a “subdomain of language”)
Classes: concepts (things you can talk about)
Properties: attributes and relationships (how concepts are described and connected)
SHACL: business rules (checks that your graph follows the intended “grammar”)

CAC Ontology extends the Linux Foundation Cyber Domain Ontology ecosystem (UCO + CASE), which helps different tools and organizations exchange investigation information in a semantically consistent way:

Cyber Domain Ontology: https://cyberdomainontology.org/

What you get out (the deliverables)

When you run the workflow on a source document, you should end up with a small set of outputs that are easy to share and easy to audit:

Source metadata: where the document came from (URL or file path), collection time, collector identity
Hashes: cryptographic hashes so you can prove which exact source was used
Normalized text: a stable plain-text version used for extraction
A minimal “skeleton” graph: UUID-only nodes for the source + collection/provenance scaffolding
Example knowledge graph (.ttl): the modeled entities, actions, evidence, and relationships
SPARQL analytics (.rq): 10-15 explainable queries that run against the example graph
Validation results: SHACL conformance output (what passed, what failed)

The goal is that someone else can look at your .ttl / .rq and understand:

what was modeled
what source evidence supports it
how confident the extraction was

How provenance stays connected (why this matters)

In CAC Ontology, modeled claims should be explainable. That means the graph should retain enough provenance to answer:

What source said this?
Which step produced this triple?
Who/what performed the extraction?

The canonical prompt describes concrete patterns (for example, using UCO/CASE Action + ProvenanceRecord) so that:

graphs are fully connected (no isolated nodes)
assertions have evidence pointers and confidence
results can be validated with SHACL

Practical ways to start

Start with one public document (single-document mode), produce one example .ttl + .rq.
Use SHACL to validate before sharing.
If the document reveals missing concepts, propose updates via GitHub Issues or PRs.

Model documents with AI (agent prompt)

The plain-language idea

What you get out (the deliverables)

How provenance stays connected (why this matters)

Practical ways to start

Links