Semio: Semantic Types

The type system that makes cross-system workflows deterministic

Semio is DataGrout’s semantic interface layer. It gives tools a typed contract — declaring what kind of data they consume and produce — so the planning engine can verify workflow compatibility before anything runs, and route data between systems without LLM guesswork.

The full formal treatment is in the lab paper: Semio: A Semantic Interface Layer for Tool-Oriented AI Systems.


The Problem Semio Solves

A customer in Salesforce looks different from a customer in QuickBooks. Different field names, different IDs, different schemas. Bridging them traditionally means:

  • Writing hand-coded glue for every pair of systems (O(N²) complexity)
  • Or asking an LLM to figure it out at runtime (probabilistic, fragile, token-expensive)

Semio takes a third path: tools declare their semantic types, and the planner reasons about compatibility symbolically before execution. The LLM describes intent; Semio handles schema matching.


Semantic Types

Every Semio type follows the pattern:

<family>.<entity>@<version>

Examples:

Type Meaning
crm.lead@1 A CRM lead record
billing.invoice@1 A billing invoice
billing.customer@1 A billing customer
core.email@1 An email address (primitive)
crm.lead.list@1 A list of CRM leads

These types exist independently of any vendor. A Salesforce lead and a HubSpot contact are both crm.lead@1.


Tool Contracts

Tools declare their inputs and outputs using Semio types. When you look at a tool in the Playground or via discovery.discover, you see its semantic contract:

tool: salesforce@1/get_lead@1
outputs:
  - type: crm.lead@1
    keys: [id, email]
tool: quickbooks@1/create_invoice@1
inputs:
  - name: customer
    type: billing.customer@1
    required: true
outputs:
  - type: billing.invoice@1

The planning engine uses these contracts to verify that workflow steps connect — that what one tool outputs is compatible with what the next tool expects.


Adapters: Type Bridges

When two tools use related but different types, Semio adapters bridge the gap. An adapter declares that one type can be transformed into another, using a shared identity key (like email).

crm.lead@1 ──[adapter]──▶ billing.customer@1
                anchor: email

When the planner finds a workflow that requires billing.customer@1 but you only have crm.lead@1, it inserts the adapter automatically. No LLM reasoning needed at execution time.


How This Affects Planning

When you call discovery.plan or use flow.into, the planner works with Semio types:

  1. Input types: What data do you have to start with?
  2. Goal type: What type does the final step need to produce?
  3. Path search: Find a chain of tools and adapters that bridges the gap
  4. Verification: Check type safety at every step before execution

This is what allows Cognitive Trust Certificates to assert “type safe” as a compile-time proof — the planner checked every type transformation before any tool ran.

Example: Goal is to create an invoice given only an email address:

core.email@1
  → [salesforce@1/get_lead@1]
  → crm.lead@1
  → [adapter: crm.lead@1 → billing.customer@1]
  → billing.customer@1
  → [quickbooks@1/create_invoice@1]
  → billing.invoice@1

The entire path is verified before execution begins.


Type Tiers

Fields within a Semio type are categorized into tiers that guide planning and PII handling:

Tier Meaning
Core Required for basic operations (id, name, email)
Useful Enhance workflows but aren’t strictly required (company, status)
PII Personally identifiable — triggers Dynamic Redaction (email, phone)
Index Optimized for search and lookup (email, company)

The planner uses tiers to request only the fields a workflow actually needs, and to flag when PII fields require policy clearance.


Identity Anchors (Keys)

Cross-system entity resolution uses shared keys, not system-specific IDs. When a Salesforce lead and a QuickBooks customer represent the same person, they’re matched via a shared key like email — not their respective internal IDs.

Salesforce lead: { id: "00Q...", email: "jane@acme.com" }
QuickBooks customer: { id: "cust_99", email: "jane@acme.com" }
        └── matched via email anchor ──┘

This is why Semio adapters specify an anchor key: it’s the identity field that survives the type transformation.


Automatic Enrichment

When a workflow step needs a field that the previous step didn’t return, the planner searches for enrichment tools automatically. If you have a lead with only {id, email} but the next step needs status, the planner finds a tool that can look up status by id and inserts it before proceeding.

This happens transparently — you describe the goal, the planner figures out what data needs to be filled in.


Related