The Thesis That Changed Everything
Most enterprise software is built the wrong way.
You identify a pain — say, IT teams drowning in alerts — and you build a product for it. It ships. It works. Then another pain surfaces: incident management is chaotic. You build another product. Then network troubleshooting. Then cloud diagnostics. Each product lives in a silo. Each has its own data model, its own agent logic, its own onboarding. Each multiplies your engineering cost by n.
When I joined FuturePath.AI as Technical PM, I had a decision to make: build six good products or build one platform that generates products.
I chose the platform.
This is the story of that choice — what it required, what it unlocked, and what it taught me about the difference between shipping features and building category infrastructure.
The Problem Space
Enterprise IT operations is a coordination disaster.
An alert fires. A NOC engineer opens three tools to triage it. They file a ticket manually. The ticket has the wrong configuration data because the CMDB hasn't been updated in months. An automation runs on bad data. It escalates to L2. L2 investigates, fixes it, but doesn't log what they did. The next time the same issue fires, the cycle repeats.
This isn't a tooling problem. It's a systemic trust and automation problem:
- Data isn't trustworthy — CMDB data is stale or incomplete; tickets contain errors; the "source of truth" isn't true
- Automation can't be trusted — runbooks are rigid; agents that fail quietly are more dangerous than no agents
- Knowledge doesn't compound — every investigation disappears into tickets nobody reads; every fix gets forgotten
- Governance is binary — either the machine runs autonomously (risky) or humans approve everything (slow)
The opportunity wasn't to build another alerting tool or ticketing plugin. It was to build the infrastructure layer that makes autonomous IT operations trustworthy enough to actually automate.
What I Set Out to Build
Before writing a single PRD, I defined three non-negotiables:
1. The platform must generate products, not just run them. A NOC analyst, a network engineer, a cloud architect, and a change manager all need different entry points, different agent behaviors, different data views. The platform had to be configurable enough that deploying a new persona-specific product required configuration, not code.
2. Governance must be built in, not bolted on. Enterprise IT is regulated, risk-averse, and politically complex. Autonomous agents that can't be audited, explained, or overridden will never get past legal. Governance had to be a first-class platform capability — not a feature added later.
3. The data layer must be fixed before the AI layer. I'd seen what happened when AI ran on dirty data at Wipro — confident outputs, wrong conclusions. The platform's AI could only be as trustworthy as the data underneath it. We had to fix the foundation before building anything visible.
Architecture: Seven Modules, Three Layers
The platform is organized into three logical layers, each serving a distinct role in the autonomous operations stack.
Layer 1 — Governance
SuperAdmin Dashboard is the platform control plane. It's where my team registers and provisions all platform resources: MCP servers (the external tool connectors), knowledge datasets, skills (structured SOPs), agent templates, and guardrail configurations. SuperAdmin is the supply side — it creates what tenants can use.
Org Admin is the tenant configuration layer. Each enterprise customer (an "org") has an Org Admin who selects from the SuperAdmin-provisioned resources and activates them for their organization. They control which agents their team can use, what data sources those agents access, and what approval gates are required for autonomous actions.
This two-tier model was one of the most important architectural decisions we made. It enforces clean multi-tenancy without giving tenants the ability to misconfigure platform-level resources. SuperAdmin controls what exists. Org Admin controls what's used.
Org Analytics gives every org a real-time observability layer across six dashboards: Overview (platform health), Costs (token consumption, LLM spend), Performance (latency, execution time), Engagement (usage by team, by agent), Value (incidents resolved, time saved), and Tracing (per-execution step-by-step replay). This wasn't a nice-to-have — it was the evidence layer that justified AI investment to enterprise procurement teams.
Layer 2 — Core Engine
Knowledge Neuron is the platform's RAG (Retrieval-Augmented Generation) system. When an agent needs to answer a question or follow a procedure, it retrieves relevant context from Knowledge Neuron before generating a response. The system supports four document types (raw text, structured tables, PDFs, web URLs) and four chunking strategies that can be configured per dataset depending on the retrieval use case.
The critical design decision here: Knowledge Neuron isn't just a vector store. It's a versioned, auditable knowledge management system where datasets can be updated, reprocessed, and traced back to their source. When an agent gives a wrong answer, you can trace it to the exact chunk — which makes debugging and quality improvement systematic rather than guesswork.
Capabilities (MCP Servers) are the bridge between AI and the real world. Using the Model Context Protocol, agents can execute tools against external systems: ServiceNow, Jira, Cisco DNAC, SolarWinds, Azure Monitor, Ansible, Salesforce. Each MCP server exposes a set of "tools" — discrete, parameterized actions the agent can choose to call. As of writing: 8 registered servers, 44 discovered tools.
The MCP architecture solved a problem that had plagued earlier AI systems: tool call opacity. With MCP, every tool call is structured, parameterized, logged, and replayable. Governance can attach approval gates to specific tools. Guardrails can validate inputs before execution. Nothing runs blind.
Skills are structured operating procedures — reusable action sequences that agents follow. The key innovation is the Skill Indirection Layer: a skill binds to an MCP "slot" (a logical tool category) rather than a specific MCP server. This means the same skill can run in production (against live ServiceNow) and in staging (against a sandbox) without modification. Skills become environment-portable.
Triggers are the event layer — the conditions that launch agents. They support 14 trigger types: webhook (incoming alerts from monitoring systems), API (external system calls), cron (scheduled jobs), and manual (user-initiated). The Trigger system is what makes the platform proactive rather than reactive. Incidents fire agents, not humans.
Layer 3 — Agents & Applications
Agent Types — The platform supports five distinct agent archetypes, each with a different operational profile:
- Chat Agents — conversational, stateful, user-facing. Used by NOC analysts to investigate incidents through dialogue.
- Automation Agents — trigger-driven, unattended, high-volume. Used to triage alerts, update tickets, and run remediation scripts autonomously.
- Widget Agents — embedded, context-aware, reactive. Surface AI assistance inside existing ITSM portals without requiring the user to leave their workflow.
- Structured Agents — output-constrained, schema-bound. Generate typed data: incident summaries, change assessments, post-incident reports — not free-form text.
- Subagents — orchestration primitives. Called by other agents to handle specialized sub-tasks. The coordination layer that enables multi-step, multi-agent workflows.
Applications are the product layer — the user-facing, work-item-centric operational solutions built on top of agents. An Application bundles multiple agent types, a knowledge base, an MCP toolset, and a trigger configuration into a deployable operational product. Each of the 9 products in my portfolio is an Application.
Executions & Approvals is the runtime intelligence layer. Every agent execution creates an immutable execution record with full step-by-step tracing, tool call logs, retrieval context, and outcome. The Approvals system intercepts specific tool calls (those configured to require human authorization) and routes them to designated approvers via email or in-platform notification. Agents wait at the gate. They don't proceed until authorized.
Dual Governance: The Architecture That Made Enterprise Trust Possible
The hardest thing to get right in autonomous AI systems isn't the AI — it's the governance.
Enterprise IT teams have been burned by automation before. A runbook runs at 3am, misidentifies a production database as a dev resource, and deletes it. Trust collapses. Autonomous systems get banned. The manual process returns.
We built two parallel, complementary governance systems:
Guardrails — Automatic Content Validation
Guardrails operate at six pipeline points in every agent execution:
- Input validation — Is the trigger payload well-formed? Does it meet content policy?
- Retrieval filter — Are the knowledge chunks retrieved relevant and appropriate?
- Prompt construction — Does the composed prompt contain prohibited content?
- LLM output — Does the raw response meet safety and format requirements?
- Tool call validation — Are the tool parameters within expected ranges and types?
- Final response — Does the output meet org-level content policies before delivery?
Guardrails are configurable per org and per agent. They run automatically — no human in the loop unless a guardrail fires, at which point the execution is paused, flagged, and routed for review.
Approvals — Human-in-the-Loop Gates
Approvals are manual gates on specific tools. When an agent decides to call a tool marked as requiring approval — say, "restart production server" or "update CMDB record" — it pauses and sends an approval request.
The approver sees: what the agent is trying to do, why it chose this action, the tool parameters, and the full execution context. They can approve, reject, or modify. The agent resumes or reran.
This is the key insight that most AI governance frameworks miss: automation and human oversight aren't mutually exclusive. The platform handles 97% of an incident autonomously. The 3% that needs human judgment gets a precisely scoped approval request — not a Slack notification that something happened.
The Nine Products
Everything above is infrastructure. What it enables is nine distinct operational products, each owned entirely by me.
Data Quality Layer
Before any AI-powered IT operation can work reliably, the data underneath it has to be trustworthy. I designed and architected three data quality products before building any user-facing CoResolve product.
CMDB Intelligence Engine — Enterprise CMDBs (Configuration Management Databases) are notoriously stale. CIs (configuration items) go undiscovered. Relationships go unmapped. The CMDB IE is a 10-engine system that automatically discovers infrastructure relationships from live traffic (NetFlow, SNMP polling, log correlation), reconciles against existing CMDB records, and surfaces discrepancies for remediation. It runs before alerts even fire — because if your CMDB is wrong, every alert diagnosis that references it is wrong.
Reality Correlation Engine — ITSM tickets are full of errors. Wrong affected CIs. Misclassified severity. Inaccurate environment labels. The RCE is an 11-engine pipeline that validates ticket data against infrastructure reality in near-real-time. It flags anomalies, suggests corrections, and builds a confidence score for each ticket field. The RCE is the trust layer between ITSM data and the agents that act on it.
Solutions Assist — A pre-sales and deal intelligence product. 12 AI agents, 14 triggers, Salesforce CRM sync. It equips sales engineers with AI-generated solution briefs, competitive comparisons, and technical discovery documents — generated from Knowledge Neuron datasets populated with product documentation and battle cards. It's the only data quality product with an external-facing persona (prospects and sales teams, not ops engineers).
CoResolve Suite — Six IT Operations Products
The CoResolve suite is six domain-specific applications, each targeting a distinct IT operations tower. Each is built on the same platform — same agent types, same MCP infrastructure, same governance layer. What differs is the knowledge (tower-specific runbooks and documentation), the tools (domain-specific MCP servers), and the trigger configuration.
NOC Command Center CoResolve — Single pane of glass for Level 1 NOC operations. Chat agents field analyst queries. Automation agents triage incoming alerts, correlate across domains (network + compute + cloud simultaneously), and prioritize incident queues. The Command Center reduced mean-time-to-triage by giving L1 analysts AI-assisted context for every alert — without requiring them to context-switch between five monitoring tools.
NOC Virtual Analyst (NVA / CCVA) — The autonomous investigation engine. When an incident is triaged, NVA picks it up. It queries CMDB, pulls topology data, runs diagnostic tools via MCP (ping tests, log pulls, trace routes), correlates findings, and either resolves autonomously or produces a fully documented investigation report for L2 handoff. 8,900+ automation executions to date. The metric that matters: L2 engineers spend their time on decisions, not data collection.
Network CoResolve — Multi-vendor network troubleshooting. Integrates with Cisco DNAC, Palo Alto firewalls, SolarWinds, and SD-WAN controllers via MCP. When network issues surface (high error rates, BGP flapping, QoS violations), agents pull device telemetry, identify root cause, and execute remediation steps — or route to human approval for configuration changes. The multi-vendor complexity is abstracted by the Skill Indirection Layer — the same troubleshooting skill works whether the device is Cisco or Palo Alto.
Compute CoResolve — Hybrid server and cloud compute diagnostics. Covers on-prem physical servers, VMware clusters, and cloud VMs in a single operational surface. Integrates Ansible for fleet-level remediation. When a compute issue is detected — disk saturation, memory pressure, failed instances — agents diagnose, execute standard remediation playbooks, and log outcomes. The Ansible integration was critical: it gave agents the ability to execute remediation across hundreds of nodes without human scripting.
Hybrid Cloud CoResolve — Cross-environment Azure + on-prem troubleshooting. The hardest product to build because hybrid cloud incidents don't respect environment boundaries — a misconfigured VPN gateway affects both cloud and on-prem workloads. Agents integrate Azure Monitor, on-prem SNMP, and the CMDB IE data layer to correlate across environments and identify whether the root cause lives in cloud configuration, network routing, or on-prem infrastructure.
MIM + CFCR CoResolve — Major Incident Management and Change Failure Cause Review. MIM activates during P1/P2 incidents: agents coordinate war-room timelines, draft stakeholder communications, track mitigation steps, and produce real-time status summaries. CFCR activates post-incident: agents pull change records correlated with the incident window, analyze code diffs and deployment logs, and generate a structured Root Cause Analysis. The structured agent type was essential here — RCAs aren't free-form text; they're typed documents with specific fields that feed into compliance processes.
Additional Platform Capabilities
Beyond the core architecture, several capabilities were built to complete the platform's enterprise readiness:
Document Generation (DOCX / PPT / Skills)
Structured agents don't just return text — they generate typed artifacts. The platform supports automated generation of:
- DOCX reports — Post-incident reviews, change assessments, weekly operations summaries, all in org-branded Word format
- PPT briefings — Executive-level incident timelines and SLA reports generated as PowerPoint decks for stakeholder distribution
- Skill generation from conversations — When a chat agent successfully resolves an issue through a multi-step investigation, the platform can extract the resolution path and auto-generate a reusable Skill. Manual investigations become automation blueprints.
Dashboard Previews for Different Personas
One of the platform's most operationally valuable features: persona-specific dashboard previews. The same platform surfaces different views depending on who's looking:
- NOC L1 Analyst — Alert queue, triage status, active agent executions
- NOC L2 Engineer — Investigation depth, tool call logs, pending approvals
- IT Operations Manager — Engagement metrics, automation coverage, SLA adherence
- Executive / CISO — Value dashboards, cost per incident, automation ROI
This wasn't just a UX decision — it was a procurement decision. Enterprise IT software fails adoption when it tries to serve everyone with the same interface. Different personas see different signal, different noise.
Extended Guardrail Configurations
Beyond the six standard pipeline points, the platform supports org-specific guardrail extensions:
- PII detection — Flags and redacts personal data before it's logged or sent to LLMs
- Sensitive system protection — Blocks tool calls targeting systems outside the agent's designated scope
- Rate limiting — Prevents agent cascades from exhausting API quotas or triggering security policies
- Confidence thresholds — Requires human approval when the agent's confidence score falls below a configurable floor
These guardrails were built in direct response to enterprise security reviews. Every objection became a configuration.
The Self-Improving Feedback Loop
This is the capability I'm most proud of — and the one that's hardest to explain to someone who hasn't seen it work.
Traditional automation is static. A runbook runs. It either works or it doesn't. If it doesn't, a human fixes it manually. The fix lives in someone's head, or maybe in a Confluence page nobody reads.
The TechOps platform breaks this cycle through a Self-Improving Feedback Loop:
- A novel incident occurs — something automation agents haven't seen before.
- A chat agent (or a human analyst) investigates it, working through the issue step by step.
- The investigation succeeds. The incident is resolved.
- The platform extracts the resolution path — the sequence of tool calls, knowledge retrievals, and decisions that led to resolution.
- This path is converted into a runtime Skill and saved to the platform's skill library.
- Future automation agents have access to this new skill. The next time a similar incident fires, automation handles it.
Manual investigations generate automation. Human expertise becomes machine capability. The platform gets smarter with every incident it handles.
This is what "Service as a Software" actually means in practice — the service improves continuously, without requiring manual updates to runbooks or retraining of models.
Scale & Validation
As of March 2026, the platform is in active deployment with enterprise customers:
| Metric | Value |
|---|---|
| Automation executions | 8,900+ |
| Agents deployed | 100+ across 5 types |
| MCP servers registered | 8 (44 tools discovered) |
| Knowledge datasets | 42 |
| Analytics dashboards | 6 (per org) |
| Products on platform | 9 |
| Data quality engines | 21 (10 CMDB IE + 11 RCE) |
| AI agents in Solutions Assist | 12 |
| Trigger types supported | 14 |
The metric that matters most isn't in this table. It's the one you can't easily measure: how many incidents were handled autonomously without anyone noticing. That's the goal. The best automation is the one that makes the problem invisible.
What This Taught Me
1. The platform beats the product every time.
Six separately built products for six IT towers would have taken 3x the engineering, created 6x the maintenance burden, and delivered inconsistent experiences that eroded trust. One platform, configured six ways, delivered consistent quality across all six — and every new product gets faster to build because the infrastructure already exists.
2. Fix the foundation first.
I built the CMDB Intelligence Engine and Reality Correlation Engine before building a single user-facing CoResolve product. This was the right call. Every CoResolve product draws on clean CMDB data and validated ticket data. Without the data quality layer, the AI confidence would be hollow — statistically impressive outputs on inputs that can't be trusted.
3. Governance isn't a constraint — it's the product.
The enterprises that adopt autonomous IT operations don't trust AI blindly — they trust systems they can explain, audit, and override. Dual governance (guardrails + approvals) was the feature that unlocked procurement conversations, not slowed them down. Every "we need to be able to turn this off" objection became a configuration we already had.
4. The Self-Improving Loop is the moat.
Any competitor can build an AI agent. They can even build a multi-tenant platform. What they can't replicate immediately is 8,900+ executions of institutional learning baked into runtime skills. The longer the platform runs, the smarter it gets, the wider the gap widens.
5. Product management at the platform layer requires a different mental model.
When you're building a product, your user is a person. When you're building a platform, your user is a builder — and eventually, the products they build become the user-facing surface. I spent as much time designing for Org Admins (who configure products) and SuperAdmins (who provision platform resources) as I did for end users. Platform thinking means serving the supply chain, not just the end customer.
What's Next
The platform's architecture is complete. The products are shipping. The feedback loop is compounding.
What's next is category creation — helping enterprises understand that "autonomous IT operations" isn't a feature of their existing ITSM vendor. It's a new procurement category with a new buying motion. The technical work is building the platform. The product work is making the category.
That's where I'm focused.
This case study describes the platform I own and lead at FuturePath.AI as Technical PM & Product Leader. Architecture decisions, product scope, and platform metrics reflect my direct involvement in design, prioritization, and delivery.