A Year of Thinking, a Weekend of Building: How ContextNest Shipped

In January 2025, I published The Four Pillars of Context Engineering — a framework for how organizations should think about the knowledge they feed AI systems. Writing, Selecting, Compressing, Isolating. Four problems that nobody was solving as a unified discipline.
A month later I wrote about the need for a control plane — the argument that governing how agents interact with tools and context matters more than the agents themselves. In April, I wrote about the hidden costs of tool explosion. By September, I was calling context engineering a strategic imperative.
A year of circling the same problem from different angles. Every article pointed at the same gap: enterprises have knowledge. They have AI. They don't have a governed bridge between the two.
At some point, writing about it stopped being enough.
The window
This weekend, I found myself between chapters professionally — a clean break and a fresh start. No legacy codebase. No inherited architecture. No organizational inertia. Just a weekend, an engineer I trust, and a whiteboard full of ideas that could be codified over some coffee and BhanMi (a seriously underrated breakfast).
We decided to see what we could ship in 48 hours.
Not a prototype. Not a proof of concept. Something real — open source, documented, published. The constraint was clarifying. When you have a weekend, you can't build everything. You build the thing that matters.
What the gap actually is
The problem had gotten sharper since I first wrote about it. Here's what I kept seeing across every enterprise AI deployment I touched:
The demo works. The AI answers questions fluently, pulls from the knowledge base, sounds authoritative. Then it hits production, and a compliance team discovers the system is confidently citing a policy that was deprecated months ago. The vector store still has it. The similarity score is high. Nothing in the retrieval pipeline knows the difference between current and superseded.
One incident and trust is gone. Permanently. A clinician or a compliance officer who catches the AI being confidently wrong doesn't give it another chance. They route around it.
This isn't a model problem. Smarter models don't fix it. It's not a retrieval problem — the retrieval worked fine, it just retrieved the wrong version. It's a context governance problem. Before knowledge reaches an agent, someone needs to answer:
- Who authored this, and are they authoritative?
- Has it been reviewed and approved for production use?
- What's its relationship to other knowledge in the system?
- Is it current, or has it been superseded?
No vector database answers these questions. No embedding model preserves these relationships. When you chunk a document and vectorize it, you get similarity scores. You lose authorship, approval status, hierarchy, and every edge that connects one concept to another.
The industry needed a purpose-built layer for governed context. That's what we sat down to build.
The weekend
Saturday morning, we articulated the problem, and locked in the architecture. The key decisions:
Relationships, not chunks. Knowledge isn't a flat collection of text. It's a network of relationships — concepts that feed other concepts, policies that supersede other policies, people who authored specific recommendations. The data structure had to preserve these relationships through the entire pipeline, from authoring to agent consumption. Wiki-style links, edge descriptors, front matter with metadata. Simple enough that a domain expert can author in Markdown. Structured enough that an agent can traverse it programmatically.
Governance baked in, not bolted on. Every piece of content carries its approval status, authorship chain, and access permissions as metadata. Nothing reaches a production agent without a human sign-off. For regulated industries — healthcare, financial services, legal — this is non-negotiable. For everyone else, it's the difference between an AI system people trust and one they abandon after the first bad answer.
Open protocol, commercial platform. The specification and CLI would be open source (Apache 2.0). The engine and MCP server would be AGPL — open for inspection, commercial license for enterprise use. Think git vs. GitHub. The format needs to be open for adoption. The managed platform — PromptOwl — is where the business lives.
MCP-native from day one. Any MCP-compatible client — Claude, Cursor, custom agents — can query the governed context through a standard interface. No vendor lock-in. No custom integrations. The knowledge structure, approval metadata, and authorship chain are all available at query time.
Saturday was heads-down building. Sunday was packaging, documentation, and publishing.
By Sunday night we had:
- Two open source repositories — the specification and the reference implementation
- Three npm packages live on the registry
- A white paper laying out the full technical argument: "The Context Governance Gap: Why Enterprise AI Fails Without a Control Plane"
Forty-eight hours from whiteboard to shipped.
What RAG gets wrong
Most enterprise AI deployments use Retrieval-Augmented Generation. Dump your PDFs into a vector database, chunk them, embed them, retrieve the most similar chunks at query time. It works — until it doesn't.
The problem is that embeddings are lossy compression of knowledge. When you vectorize a document, you preserve semantic similarity. You destroy everything else: who wrote it, when it was approved, what version it is, what it supersedes, what other documents it depends on, and who has permission to see it.
Four failure modes show up repeatedly:
Stale context hallucinations. An outdated policy sits in the vector store with a high similarity score. The agent cites it confidently. The current policy exists too, but nothing in the pipeline knows which one is current.
Accountability vacuum. An AI system produces an output that causes a problem. No one can trace which source documents contributed to that answer, who authored them, or whether they were approved for production use.
Shadow context. Different teams maintain their own document collections with inconsistent governance. The AI draws from all of them. Nobody owns the full picture.
Compliance exposure. A regulator or auditor asks for the provenance chain behind an AI-generated decision. There isn't one. The retrieval pipeline doesn't log what was injected, when, or why.
These aren't edge cases. They're the default outcome when you treat knowledge as a retrieval problem instead of a governance problem.
The six layers
ContextNest is built around a simple pipeline: Authoring → Versioning → Integrity → Querying → Injection → Tracing.
Every piece of content moves through these layers before it reaches an agent:
Authoring. Domain experts write in standard Markdown with YAML front matter. Wiki-style links create explicit relationships between documents. No proprietary format, no special tooling. If you can write a README, you can author governed context.
Versioning. Every edit creates a new version. Full history. Restore any previous state. You always know what changed, when, and who changed it.
Integrity. SHA-256 hash chains prove content hasn't been tampered with. Nest checkpoints create atomic snapshots of the entire knowledge structure — point-in-time reconstruction of everything an agent could have seen on a given date.
Querying. Deterministic selectors — set-algebraic queries like #compliance + status:published — return reproducible results. Same query, same context, every time. No similarity-score roulette.
Injection. Governed context reaches agents via the Model Context Protocol. Any MCP-compatible client — Claude, Cursor, custom agents — gets structured knowledge with its full metadata intact. Approval status, authorship, relationships — all of it travels with the content.
Tracing. Every context injection is logged. When an agent produces an output, you can trace exactly which documents contributed, who authored them, and whether they were approved. The provenance chain auditors need.
Why open source
The specification and CLI are Apache 2.0. The engine and MCP server are AGPL-3.0. This was deliberate.
Context governance is infrastructure. It's the kind of problem that needs a shared standard, not a proprietary lock-in. If every vendor builds their own context governance layer, enterprises end up with fragmentation — the same mess we have today, just more expensive.
The managed platform — PromptOwl — provides enterprise features on top: stewardship hierarchies, approval workflows, role-based permissions, real-time collaboration, audit logging, and compliance reporting. Think git vs. GitHub. The protocol is open. The platform is where teams go when they need to deploy without building from scratch.
What happens now
Three trends make this urgent.
First, autonomous agents are proliferating. Every agent that takes action based on knowledge it retrieves is a liability if that knowledge isn't governed. The more agents you deploy, the more the governance gap compounds.
Second, regulation is arriving. The EU AI Act, SOC 2 requirements for AI systems, industry-specific compliance frameworks — all of them will require provenance chains for AI-generated decisions. Organizations without them will be exposed.
Third, models are commoditizing. The competitive advantage isn't which model you use. It's the quality and governance of the context you feed it. The organization with the best-governed knowledge base will get the best outputs from any model.
The tools are live. The white paper has the full technical argument. The open source packages are on GitHub and npm. And PromptOwl is ready for teams that want to deploy now.
A year of thinking. A weekend of building. The governed bridge between what your organization knows and what your AI agents can reliably act on.



