Cortex API Documentation

Cortex API

Cortex is a standalone AI inference engine. It processes messages through an 8-service pipeline that handles agent configuration, customer memory, RAG knowledge retrieval, token optimization, semantic caching, intelligent model routing, prompt compilation with tool-call execution, and post-response actions — all in a single API call.

Base URL

/v1

All endpoints versioned

Auth

Bearer Token

Get API Key →

Format

JSON

Request & Response

Pipeline

8 Services

Sequential processing

Multi-model routing (GPT-4o, Claude Sonnet, Gemini Flash)

RAG knowledge retrieval with pgvector

Customer memory (cross-session facts)

Semantic caching for instant responses

Tool-call loop with webhook execution

Token optimization & context compression

Per-step latency telemetry

Prometheus + OpenTelemetry observability

🔑 Authentication

All /v1 endpoints require an API key passed via the Authorization header. API keys are prefixed with ctx_ and can be created through the developer portal.

POST /portal/api/login Get a JWT session token ▼

Authenticate with your email and password to receive a JWT session token. Use this token to manage API keys via the portal endpoints.

Request Body

Field	Type	Required	Description
email	string	Required	Your account email address
password	string	Required	Your account password

Response

200 Success — JWT token returned 401 Invalid credentials

Response
{
  "token": "eyJhbGciOiJIUzI1NiIs...",
  "user": {
    "id": "uuid",
    "email": "you@company.com",
    "company_id": "uuid"
  }
}

POST /portal/api/keys Create a new API key ▼

Create a new API key for your company. The raw key is shown only once — store it securely. Requires JWT session token from /portal/api/login.

Request Body

Field	Type	Required	Description
name	string	Required	Human-readable label for the key (1-100 chars)
scopes	string[]	Optional	Permission scopes (default: all)
expires_in_days	integer	Optional	Key expiry in days (1-365, null = never)

Response

Response
{
  "key": "ctx_abc123...xyz",
  "key_prefix": "ctx_abc1",
  "name": "Production Key",
  "scopes": ["*"],
  "expires_at": "2026-06-12T00:00:00Z",
  "created_at": "2026-05-12T10:00:00Z"
}

⚡ Pipeline Architecture

Every message flows through an 8-service pipeline. Each service is independent, testable, and adds specific capabilities to the request context. If the semantic cache hits, services 06-08 are skipped entirely for instant zero-cost responses.

01 Config

→

02 Memory

→

03 RAG

→

04 Optimize

→

05 Cache

→

06 Router

→

07 Compile + Tools

→

08 Actions

Service	Name	Description
01	Config Loader	Loads agent configuration from database (personality, dialect, prompts, actions, knowledge sources). Cached in Redis for 5 minutes.
02	Memory Recall	Retrieves customer-specific facts and knowledge from the memory system. Cross-session memory enables personalization.
03	Context Engine	RAG retrieval — searches pgvector for relevant knowledge chunks using the customer's query embedding.
04	Token Optimizer	Compresses context to fit within the model's token budget. Priority-based — core identity is never compressed.
05	Cache Check	Semantic similarity search in Redis. If a sufficiently similar query was answered before, returns the cached response (zero LLM cost).
06	Model Router	Classifies query complexity and routes to the optimal model tier (speed/nano/sonnet/complex/search).
07	Agent Compiler	Renders the Jinja2 prompt template, sends to LLM via LiteLLM. If the LLM returns tool_calls, executes them and re-calls the LLM (up to 5 rounds).
08	Post Actions	Executes any remaining post-response actions after the LLM has produced its final text response.

Chat

The core endpoint. Send a message and get an AI response processed through the full 8-service pipeline.

POST /v1/chat Process a message through the AI pipeline ▼

Send a customer message and receive an AI-generated response. The pipeline loads the agent's configuration, recalls customer memory, retrieves relevant knowledge via RAG, optimizes the token budget, checks the semantic cache, routes to the optimal model, compiles the prompt, and executes any tool calls — all in one request.

Request Body

Field	Type	Required	Description
company_id	string	Required	UUID — your company/tenant identifier
agent_id	string	Required	UUID — which AI agent to use for processing
conversation_id	string	Required	UUID — conversation thread identifier
customer_id	string	Required	Customer identifier for memory recall
message	string	Required	The customer's message text (1-16,000 chars)
channel	string	Optional	Channel type: whatsapp, web, instagram, email, etc. Default: `whatsapp`
metadata	object	Optional	Extra context (language, location, etc.). Max 20 keys.

Example Request

curl

curl -X POST https://cortex.doo.ooo/v1/chat \
  -H "Authorization: Bearer ctx_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "company_id": "550e8400-e29b-41d4-a716-446655440000",
    "agent_id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
    "conversation_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "customer_id": "cust_12345",
    "message": "What are your business hours?",
    "channel": "whatsapp"
  }'

Response

200 Success 401 Unauthorized 422 Validation error 500 Pipeline error 503 LLM gateway unreachable

Response 200
{
  "reply": "Our business hours are Sunday to Thursday, 9 AM to 6 PM.",
  "model_used": "nano",
  "cached": false,
  "actions_executed": [],
  "latency_ms": 342,
  "tokens": {
    "input": 1250,
    "output": 28,
    "cost_usd": 0.00019
  },
  "context_tokens": {
    "before_compression": 3200,
    "after_compression": 1250,
    "savings_pct": 60.9
  },
  "step_timings": {
    "config_load": 12,
    "memory_recall": 45,
    "context_engine": 38,
    "token_optimizer": 5,
    "cache_check": 8,
    "model_router": 2,
    "agent_compiler": 228,
    "tool_call_rounds": 0,
    "tool_execution": 0
  }
}

Response with Tool Calls Executed

When the agent has configured actions/tools, the LLM may invoke them during processing. Tool calls are executed as HTTP webhooks and the results are fed back to the LLM automatically (up to 5 rounds).

Response 200 — with actions
{
  "reply": "Your order #ORD-1234 is currently being prepared and will ship tomorrow.",
  "model_used": "sonnet",
  "cached": false,
  "actions_executed": ["get_order_status"],
  "latency_ms": 1580,
  "tokens": {
    "input": 2100,
    "output": 45,
    "cost_usd": 0.0032
  },
  "step_timings": {
    "agent_compiler": 1420,
    "tool_call_rounds": 1,
    "tool_execution": 380
  }
}

Agents

Inspect an agent's full configuration as loaded by the pipeline.

GET /v1/agents/{agent_id}/config Get agent configuration ▼

Returns the full agent configuration as loaded from the database, including personality, dialect, system prompt, FAQs, scenarios, knowledge sources, action definitions, and all settings. Result is cached in Redis for 5 minutes (same cache the pipeline uses).

Path Parameters

Field	Type	Required	Description
agent_id	string	Required	UUID of the agent

Response

200 Full agent config 404 Agent not found

Response 200
{
  "id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
  "company_id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "Support Agent",
  "personality": {
    "id": "uuid",
    "name": "Friendly Professional"
  },
  "dialect": {
    "name": "Egyptian Arabic",
    "primary_language": "ar"
  },
  "action_definitions": [
    {
      "id": "uuid",
      "name": "get_order_status",
      "description": "Look up an order by ID",
      "source_type": "custom_function",
      "method": "GET",
      "parameters_schema": { "..." }
    }
  ],
  "knowledge_source_ids": ["uuid1", "uuid2"],
  "workflow_faqs": ["..."]
}

Knowledge Ingestion

Upload documents and URLs to build the agent's knowledge base. The ingestion pipeline converts content to markdown, splits into semantic chunks, generates vector embeddings, and stores them for RAG retrieval.

POST /v1/knowledge/ingest/file Ingest a file (PDF, DOCX, PPTX, etc.) ▼

Converts a file to Markdown, splits into semantic chunks, generates vector embeddings, and stores in the knowledge base. Supports PDF, DOCX, PPTX, XLSX, and other document formats via MarkItDown.

Request Body

Field	Type	Required	Description
source_id	string	Required	UUID — knowledge source record identifier
file_path	string	Required	Path to the uploaded file on disk
user_id	string	Required	UUID — who uploaded the file
embedding_model	string	Optional	Embedding model: `bge-m3` (default), `openai`, or `cohere`

Response

200 Ingestion result 500 Conversion or embedding error

Response 200
{
  "source_id": "uuid",
  "status": "completed",
  "chunks_created": 42,
  "chunks_embedded": 42,
  "error": null
}

POST /v1/knowledge/ingest/url Ingest a web page URL ▼

Crawls a web page, extracts text content, splits into chunks, embeds, and stores for RAG retrieval.

Request Body

Field	Type	Required	Description
source_id	string	Required	UUID — knowledge source record identifier
url	string	Required	Web page URL to crawl and ingest
user_id	string	Required	UUID — who triggered the ingestion
embedding_model	string	Optional	Embedding model: `bge-m3` (default), `openai`, or `cohere`

Response

Response 200
{
  "source_id": "uuid",
  "status": "completed",
  "chunks_created": 18,
  "chunks_embedded": 18,
  "error": null
}

POST /v1/knowledge/ingest/documents Ingest structured knowledge documents ▼

Embeds an agent's structured knowledge documents into the RAG pipeline for semantic search retrieval.

Request Body

Field	Type	Required	Description
agent_id	string	Required	UUID — the agent whose documents to ingest
company_id	string	Required	UUID — company for scoping
user_id	string	Required	UUID — who triggered the ingestion
embedding_model	string	Optional	Embedding model: `bge-m3` (default), `openai`, or `cohere`

Response

Response 200
{
  "agent_id": "uuid",
  "status": "completed",
  "documents_processed": 8,
  "chunks_created": 24,
  "chunks_embedded": 24
}

POST /v1/knowledge/reembed Re-embed existing chunks with a different model ▼

Re-generates embeddings for existing knowledge chunks using a different embedding model. Useful when switching between embedding providers.

Request Body

Field	Type	Required	Description
source_id	string	Required	UUID — knowledge source to re-embed
embedding_model	string	Optional	New embedding model: `bge-m3` (default), `openai`, or `cohere`

Response

Response 200
{
  "source_id": "uuid",
  "status": "completed",
  "chunks_embedded": 42,
  "error": null
}

Memory

Manage customer memory facts. Cortex automatically extracts and stores facts during conversations. These endpoints let you query, save, and delete facts directly.

POST /v1/memory/{customer_id} Query customer memory ▼

Retrieve all stored facts and knowledge about a specific customer. Returns memory facts, knowledge snippets, and knowledge base entries.

Path Parameters

Field	Type	Required	Description
customer_id	string	Required	Customer identifier

Request Body

Field	Type	Required	Description
company_id	string	Required	UUID — company/tenant scope
query	string	Optional	Filter query for memory facts

Response

Response 200
{
  "customer_id": "cust_12345",
  "company_id": "uuid",
  "fact_count": 5,
  "knowledge_doc_count": 2,
  "kb_entry_count": 3,
  "facts": [
    {
      "id": "uuid",
      "fact": "Prefers Arabic language",
      "category": "preference",
      "confidence": 0.95
    }
  ],
  "knowledge_snippets": ["..."],
  "kb_entries": ["..."]
}

POST /v1/memory/{customer_id}/save Save a memory fact ▼

Manually save a fact about a customer. Facts are used by Service 02 (Memory Recall) to personalize responses.

Path Parameters

Field	Type	Required	Description
customer_id	string	Required	Customer identifier

Request Body

Field	Type	Required	Description
company_id	string	Required	UUID — company/tenant scope
fact	string	Required	The fact text (e.g., "Prefers Arabic language")
category	string	Optional	Category: `preference`, `history`, `sentiment`, `context`, `general`. Default: `general`
confidence	float	Optional	Confidence score 0.0-1.0. Default: `1.0`
source	string	Optional	Fact origin. Default: `manual`
agent_id	string	Optional	Scope fact to a specific agent

Response

Response 200
{
  "customer_id": "cust_12345",
  "fact_id": "uuid",
  "saved": true
}

DELETE /v1/memory/fact/{fact_id} Delete a memory fact ▼

Remove a specific memory fact by its ID.

Path Parameters

Field	Type	Required	Description
fact_id	string	Required	UUID of the fact to delete

Response

Response 200
{
  "fact_id": "uuid",
  "deleted": true
}

Semantic Cache

Pre-warm and manage the semantic cache. Cached responses are returned instantly without hitting the LLM, giving zero-cost sub-millisecond responses for frequently asked questions.

POST /v1/cache/warm Pre-warm cache with FAQ pairs ▼

Store question-answer pairs in the semantic cache. When a customer asks a semantically similar question, the cached answer is returned instantly.

Request Body

Field	Type	Required	Description
company_id	string	Required	UUID — company/tenant scope
agent_id	string	Required	UUID — agent to cache for
pairs	object[]	Required	Array of `{"question": "...", "answer": "..."}` pairs

Example Request

JSON
{
  "company_id": "uuid",
  "agent_id": "uuid",
  "pairs": [
    {
      "question": "What are your business hours?",
      "answer": "We are open Sunday to Thursday, 9 AM to 6 PM."
    },
    {
      "question": "Where are you located?",
      "answer": "Our office is at 123 Business St, Cairo."
    }
  ]
}

Response

Response 200
{
  "status": "completed",
  "pairs_stored": 2,
  "pairs_errored": 0,
  "pairs_total": 2
}

DELETE /v1/cache/flush Flush cached entries ▼

Clear cached responses. Optionally scoped to a specific company and/or agent.

Query Parameters

Field	Type	Required	Description
company_id	string	Optional	Scope flush to a specific company
agent_id	string	Optional	Scope flush to a specific agent

Response

Response 200
{
  "status": "flushed",
  "entries_deleted": 15
}

GET /v1/cache/stats Get cache index stats ▼

Returns statistics about the semantic cache index — document count, record count, and indexing status.

Response

Response 200
{
  "status": "active",
  "index_name": "cortex_cache",
  "num_docs": 156,
  "num_records": 156,
  "indexing": "complete"
}

🔧 Actions & Tool Calls

Cortex agents can execute external actions (HTTP webhooks) during message processing. When the LLM determines it needs data from an external system, it triggers tool calls that Cortex executes automatically.

LLM Call

→

tool_calls?

→

Execute Webhooks

→

Feed Results to LLM

→

Text Response

Repeats up to 5 rounds. If max rounds reached, LLM is forced to produce a text response.

Action Definition Schema

Each action configured on an agent includes these fields:

Field	Type	Description
id	string	UUID — unique action identifier
name	string	Function name the LLM uses to call this tool
description	string	Human-readable description for the LLM
source_type	string	`custom_function` (simple webhook) or `action` (OpenAPI-derived)
webhook_url	string	Full URL for the HTTP request
method	string	HTTP method: GET, POST, PUT, PATCH, DELETE
headers	object	Static headers to include in the request
auth_type	string	Auth type: `api_key`, `bearer`, `oauth2`, `basic`, `none`
parameters_schema	object	JSON Schema for function parameters (sent to LLM)
path_template	string	URL path template with placeholders (e.g., `/api/v1/orders/{order_id}`)
query_schema	object	JSON Schema for query parameters (action-based tools)
rate_limit	object	Rate limit config: `{"max_calls": 10, "window_seconds": 60}`

Action Sources

custom_function — Simple webhook: name, URL, method, and a parameters schema. Best for internal APIs.

action — OpenAPI-spec-derived: method, path template, body/query/path schemas, with auth resolved from connections.

Execution Security

Only actions linked to the agent are executable

Auth credentials loaded from database, never from LLM output

Webhook URLs validated (no private IPs in production)

15-second timeout per webhook call

Response size capped at 50KB

Automatic retry (up to 2 retries) for transient failures

Example: Tool Call Flow

Sequence
// 1. Customer asks about their order
Customer → "Where is my order #ORD-1234?"

// 2. Cortex sends tools to LLM with the message
LLM receives: message + tools: [get_order_status, track_shipment]

// 3. LLM decides to call get_order_status
LLM returns: tool_calls: [{
  name: "get_order_status",
  arguments: { order_id: "ORD-1234" }
}]

// 4. Cortex executes the webhook
HTTP GET → https://api.store.com/orders/ORD-1234
Response → { status: "shipped", eta: "2026-05-14" }

// 5. Result fed back to LLM
LLM receives: tool result + original context

// 6. LLM generates final response
LLM → "Your order #ORD-1234 has been shipped and will arrive by May 14."

Health & Observability

Monitor Cortex health, dependencies, and performance metrics.

GET /health Basic health check ▼

Returns 200 if the Cortex process is alive. Used by container orchestration health checks. Does not test dependencies.

Response

Response 200
{
  "status": "ok",
  "version": "0.1.0"
}

GET /health/detailed Detailed dependency health check ▼

Tests every downstream dependency: PostgreSQL (SELECT 1), Redis (PING), and LiteLLM (/health). Returns individual status for each.

Response

200 All checks passed 200 Degraded — some checks failed

Response 200
{
  "status": "ok",
  "version": "0.1.0",
  "checks": {
    "postgres": "ok",
    "redis": "ok",
    "litellm": "ok"
  }
}

GET /v1/metrics Prometheus-format metrics ▼

Exposes all Cortex metrics in Prometheus text format for Grafana scraping. Includes counters, histograms, and labels for model tier, agent ID, etc.

Response

text/plain
# HELP cortex_config_cache_hits_total Number of agent config cache hits
# TYPE cortex_config_cache_hits_total counter
cortex_config_cache_hits_total{agent_id="abc123"} 42

# HELP cortex_pipeline_duration_ms Pipeline processing time
# TYPE cortex_pipeline_duration_ms histogram
cortex_pipeline_duration_ms_sum{model="sonnet"} 15420
cortex_pipeline_duration_ms_count{model="sonnet"} 38

GET /v1/metrics/dashboard JSON metrics for dashboards ▼

Structured JSON metrics for dashboard integration. Returns config loading performance, model routing stats, and agent compilation metrics.

Response

Response 200
{
  "config_loading": {
    "cache_hit_rate": 94.2,
    "total_loads": 520,
    "cache_hits": 490,
    "avg_duration_ms": 8
  },
  "model_routing": {
    "total_routed": 500,
    "by_tier": {
      "nano": 280,
      "sonnet": 150,
      "complex": 70
    }
  },
  "agent_compilation": {
    "avg_llm_latency_ms": 340,
    "tool_call_rate": 18.5,
    "avg_tool_rounds": 1.2
  }
}

🔑 API Keys

Create and manage API keys for accessing Cortex endpoints. Keys are prefixed with ctx_ and scoped to your company.

Loading keys...

Error Codes

Standard HTTP status codes used across all endpoints.

Code	Status	Description
200	Success	Request processed successfully
401	Unauthorized	Missing, invalid, or expired API key
404	Not Found	Resource not found (agent, fact, etc.)
422	Validation Error	Invalid request body (Pydantic validation failed)
429	Rate Limited	Too many requests — retry after the specified interval
500	Internal Error	Unexpected pipeline error
503	Service Unavailable	LLM gateway unreachable
504	Gateway Timeout	LLM call timed out

Sign in to Cortex

Create API Key

Cortex API Documentation