Cortex API
Cortex is a standalone AI inference engine. It processes messages through an 8-service pipeline that handles
agent configuration, customer memory, RAG knowledge retrieval, token optimization, semantic caching,
intelligent model routing, prompt compilation with tool-call execution, and post-response actions — all in a single API call.
Base URL
/v1
All endpoints versioned
Format
JSON
Request & Response
Pipeline
8 Services
Sequential processing
Multi-model routing (GPT-4o, Claude Sonnet, Gemini Flash)
RAG knowledge retrieval with pgvector
Customer memory (cross-session facts)
Semantic caching for instant responses
Tool-call loop with webhook execution
Token optimization & context compression
Per-step latency telemetry
Prometheus + OpenTelemetry observability
🔑 Authentication
All
/v1 endpoints require an API key passed via the Authorization header.
API keys are prefixed with ctx_ and can be created through the developer portal.
POST
/portal/api/login
Get a JWT session token
Authenticate with your email and password to receive a JWT session token. Use this token to manage API keys via the portal endpoints.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| string | Required | Your account email address | |
| password | string | Required | Your account password |
Response
200 Success — JWT token returned
401 Invalid credentials
Response
{
"token": "eyJhbGciOiJIUzI1NiIs...",
"user": {
"id": "uuid",
"email": "you@company.com",
"company_id": "uuid"
}
}
POST
/portal/api/keys
Create a new API key
Create a new API key for your company. The raw key is shown only once — store it securely. Requires JWT session token from
/portal/api/login.Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| name | string | Required | Human-readable label for the key (1-100 chars) |
| scopes | string[] | Optional | Permission scopes (default: all) |
| expires_in_days | integer | Optional | Key expiry in days (1-365, null = never) |
Response
Response
{
"key": "ctx_abc123...xyz",
"key_prefix": "ctx_abc1",
"name": "Production Key",
"scopes": ["*"],
"expires_at": "2026-06-12T00:00:00Z",
"created_at": "2026-05-12T10:00:00Z"
}
⚡ Pipeline Architecture
Every message flows through an 8-service pipeline. Each service is independent, testable, and adds
specific capabilities to the request context. If the semantic cache hits, services 06-08 are skipped entirely for instant zero-cost responses.
01 Config
→
02 Memory
→
03 RAG
→
04 Optimize
→
05 Cache
→
06 Router
→
07 Compile + Tools
→
08 Actions
| Service | Name | Description |
|---|---|---|
| 01 | Config Loader | Loads agent configuration from database (personality, dialect, prompts, actions, knowledge sources). Cached in Redis for 5 minutes. |
| 02 | Memory Recall | Retrieves customer-specific facts and knowledge from the memory system. Cross-session memory enables personalization. |
| 03 | Context Engine | RAG retrieval — searches pgvector for relevant knowledge chunks using the customer's query embedding. |
| 04 | Token Optimizer | Compresses context to fit within the model's token budget. Priority-based — core identity is never compressed. |
| 05 | Cache Check | Semantic similarity search in Redis. If a sufficiently similar query was answered before, returns the cached response (zero LLM cost). |
| 06 | Model Router | Classifies query complexity and routes to the optimal model tier (speed/nano/sonnet/complex/search). |
| 07 | Agent Compiler | Renders the Jinja2 prompt template, sends to LLM via LiteLLM. If the LLM returns tool_calls, executes them and re-calls the LLM (up to 5 rounds). |
| 08 | Post Actions | Executes any remaining post-response actions after the LLM has produced its final text response. |
Chat
The core endpoint. Send a message and get an AI response processed through the full 8-service pipeline.
POST
/v1/chat
Process a message through the AI pipeline
Send a customer message and receive an AI-generated response. The pipeline loads the agent's configuration,
recalls customer memory, retrieves relevant knowledge via RAG, optimizes the token budget, checks the semantic cache,
routes to the optimal model, compiles the prompt, and executes any tool calls — all in one request.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| company_id | string | Required | UUID — your company/tenant identifier |
| agent_id | string | Required | UUID — which AI agent to use for processing |
| conversation_id | string | Required | UUID — conversation thread identifier |
| customer_id | string | Required | Customer identifier for memory recall |
| message | string | Required | The customer's message text (1-16,000 chars) |
| channel | string | Optional | Channel type: whatsapp, web, instagram, email, etc. Default: whatsapp |
| metadata | object | Optional | Extra context (language, location, etc.). Max 20 keys. |
Example Request
curl
curl -X POST https://cortex.doo.ooo/v1/chat \
-H "Authorization: Bearer ctx_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"company_id": "550e8400-e29b-41d4-a716-446655440000",
"agent_id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
"conversation_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"customer_id": "cust_12345",
"message": "What are your business hours?",
"channel": "whatsapp"
}'
Response
200 Success
401 Unauthorized
422 Validation error
500 Pipeline error
503 LLM gateway unreachable
Response 200
{
"reply": "Our business hours are Sunday to Thursday, 9 AM to 6 PM.",
"model_used": "nano",
"cached": false,
"actions_executed": [],
"latency_ms": 342,
"tokens": {
"input": 1250,
"output": 28,
"cost_usd": 0.00019
},
"context_tokens": {
"before_compression": 3200,
"after_compression": 1250,
"savings_pct": 60.9
},
"step_timings": {
"config_load": 12,
"memory_recall": 45,
"context_engine": 38,
"token_optimizer": 5,
"cache_check": 8,
"model_router": 2,
"agent_compiler": 228,
"tool_call_rounds": 0,
"tool_execution": 0
}
}
Response with Tool Calls Executed
When the agent has configured actions/tools, the LLM may invoke them during processing. Tool calls are executed as HTTP webhooks and the results are fed back to the LLM automatically (up to 5 rounds).
Response 200 — with actions
{
"reply": "Your order #ORD-1234 is currently being prepared and will ship tomorrow.",
"model_used": "sonnet",
"cached": false,
"actions_executed": ["get_order_status"],
"latency_ms": 1580,
"tokens": {
"input": 2100,
"output": 45,
"cost_usd": 0.0032
},
"step_timings": {
"agent_compiler": 1420,
"tool_call_rounds": 1,
"tool_execution": 380
}
}
Agents
Inspect an agent's full configuration as loaded by the pipeline.
GET
/v1/agents/{agent_id}/config
Get agent configuration
Returns the full agent configuration as loaded from the database, including personality, dialect, system prompt,
FAQs, scenarios, knowledge sources, action definitions, and all settings. Result is cached in Redis for 5 minutes (same cache the pipeline uses).
Path Parameters
| Field | Type | Required | Description |
|---|---|---|---|
| agent_id | string | Required | UUID of the agent |
Response
200 Full agent config
404 Agent not found
Response 200
{
"id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
"company_id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Support Agent",
"personality": {
"id": "uuid",
"name": "Friendly Professional"
},
"dialect": {
"name": "Egyptian Arabic",
"primary_language": "ar"
},
"action_definitions": [
{
"id": "uuid",
"name": "get_order_status",
"description": "Look up an order by ID",
"source_type": "custom_function",
"method": "GET",
"parameters_schema": { "..." }
}
],
"knowledge_source_ids": ["uuid1", "uuid2"],
"workflow_faqs": ["..."]
}
Knowledge Ingestion
Upload documents and URLs to build the agent's knowledge base. The ingestion pipeline converts content to markdown,
splits into semantic chunks, generates vector embeddings, and stores them for RAG retrieval.
POST
/v1/knowledge/ingest/file
Ingest a file (PDF, DOCX, PPTX, etc.)
Converts a file to Markdown, splits into semantic chunks, generates vector embeddings, and stores in the knowledge base.
Supports PDF, DOCX, PPTX, XLSX, and other document formats via MarkItDown.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| source_id | string | Required | UUID — knowledge source record identifier |
| file_path | string | Required | Path to the uploaded file on disk |
| user_id | string | Required | UUID — who uploaded the file |
| embedding_model | string | Optional | Embedding model: bge-m3 (default), openai, or cohere |
Response
200 Ingestion result
500 Conversion or embedding error
Response 200
{
"source_id": "uuid",
"status": "completed",
"chunks_created": 42,
"chunks_embedded": 42,
"error": null
}
POST
/v1/knowledge/ingest/url
Ingest a web page URL
Crawls a web page, extracts text content, splits into chunks, embeds, and stores for RAG retrieval.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| source_id | string | Required | UUID — knowledge source record identifier |
| url | string | Required | Web page URL to crawl and ingest |
| user_id | string | Required | UUID — who triggered the ingestion |
| embedding_model | string | Optional | Embedding model: bge-m3 (default), openai, or cohere |
Response
Response 200
{
"source_id": "uuid",
"status": "completed",
"chunks_created": 18,
"chunks_embedded": 18,
"error": null
}
POST
/v1/knowledge/ingest/documents
Ingest structured knowledge documents
Embeds an agent's structured knowledge documents into the RAG pipeline for semantic search retrieval.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| agent_id | string | Required | UUID — the agent whose documents to ingest |
| company_id | string | Required | UUID — company for scoping |
| user_id | string | Required | UUID — who triggered the ingestion |
| embedding_model | string | Optional | Embedding model: bge-m3 (default), openai, or cohere |
Response
Response 200
{
"agent_id": "uuid",
"status": "completed",
"documents_processed": 8,
"chunks_created": 24,
"chunks_embedded": 24
}
POST
/v1/knowledge/reembed
Re-embed existing chunks with a different model
Re-generates embeddings for existing knowledge chunks using a different embedding model. Useful when switching between embedding providers.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| source_id | string | Required | UUID — knowledge source to re-embed |
| embedding_model | string | Optional | New embedding model: bge-m3 (default), openai, or cohere |
Response
Response 200
{
"source_id": "uuid",
"status": "completed",
"chunks_embedded": 42,
"error": null
}
Memory
Manage customer memory facts. Cortex automatically extracts and stores facts during conversations.
These endpoints let you query, save, and delete facts directly.
POST
/v1/memory/{customer_id}
Query customer memory
Retrieve all stored facts and knowledge about a specific customer. Returns memory facts, knowledge snippets, and knowledge base entries.
Path Parameters
| Field | Type | Required | Description |
|---|---|---|---|
| customer_id | string | Required | Customer identifier |
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| company_id | string | Required | UUID — company/tenant scope |
| query | string | Optional | Filter query for memory facts |
Response
Response 200
{
"customer_id": "cust_12345",
"company_id": "uuid",
"fact_count": 5,
"knowledge_doc_count": 2,
"kb_entry_count": 3,
"facts": [
{
"id": "uuid",
"fact": "Prefers Arabic language",
"category": "preference",
"confidence": 0.95
}
],
"knowledge_snippets": ["..."],
"kb_entries": ["..."]
}
POST
/v1/memory/{customer_id}/save
Save a memory fact
Manually save a fact about a customer. Facts are used by Service 02 (Memory Recall) to personalize responses.
Path Parameters
| Field | Type | Required | Description |
|---|---|---|---|
| customer_id | string | Required | Customer identifier |
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| company_id | string | Required | UUID — company/tenant scope |
| fact | string | Required | The fact text (e.g., "Prefers Arabic language") |
| category | string | Optional | Category: preference, history, sentiment, context, general. Default: general |
| confidence | float | Optional | Confidence score 0.0-1.0. Default: 1.0 |
| source | string | Optional | Fact origin. Default: manual |
| agent_id | string | Optional | Scope fact to a specific agent |
Response
Response 200
{
"customer_id": "cust_12345",
"fact_id": "uuid",
"saved": true
}
DELETE
/v1/memory/fact/{fact_id}
Delete a memory fact
Remove a specific memory fact by its ID.
Path Parameters
| Field | Type | Required | Description |
|---|---|---|---|
| fact_id | string | Required | UUID of the fact to delete |
Response
Response 200
{
"fact_id": "uuid",
"deleted": true
}
Semantic Cache
Pre-warm and manage the semantic cache. Cached responses are returned instantly without hitting the LLM,
giving zero-cost sub-millisecond responses for frequently asked questions.
POST
/v1/cache/warm
Pre-warm cache with FAQ pairs
Store question-answer pairs in the semantic cache. When a customer asks a semantically similar question, the cached answer is returned instantly.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| company_id | string | Required | UUID — company/tenant scope |
| agent_id | string | Required | UUID — agent to cache for |
| pairs | object[] | Required | Array of {"question": "...", "answer": "..."} pairs |
Example Request
JSON
{
"company_id": "uuid",
"agent_id": "uuid",
"pairs": [
{
"question": "What are your business hours?",
"answer": "We are open Sunday to Thursday, 9 AM to 6 PM."
},
{
"question": "Where are you located?",
"answer": "Our office is at 123 Business St, Cairo."
}
]
}
Response
Response 200
{
"status": "completed",
"pairs_stored": 2,
"pairs_errored": 0,
"pairs_total": 2
}
DELETE
/v1/cache/flush
Flush cached entries
Clear cached responses. Optionally scoped to a specific company and/or agent.
Query Parameters
| Field | Type | Required | Description |
|---|---|---|---|
| company_id | string | Optional | Scope flush to a specific company |
| agent_id | string | Optional | Scope flush to a specific agent |
Response
Response 200
{
"status": "flushed",
"entries_deleted": 15
}
GET
/v1/cache/stats
Get cache index stats
Returns statistics about the semantic cache index — document count, record count, and indexing status.
Response
Response 200
{
"status": "active",
"index_name": "cortex_cache",
"num_docs": 156,
"num_records": 156,
"indexing": "complete"
}
🔧 Actions & Tool Calls
Cortex agents can execute external actions (HTTP webhooks) during message processing. When the LLM determines
it needs data from an external system, it triggers tool calls that Cortex executes automatically.
LLM Call
→
tool_calls?
→
Execute Webhooks
→
Feed Results to LLM
→
Text Response
Repeats up to 5 rounds. If max rounds reached, LLM is forced to produce a text response.
Action Definition Schema
Each action configured on an agent includes these fields:
| Field | Type | Description |
|---|---|---|
| id | string | UUID — unique action identifier |
| name | string | Function name the LLM uses to call this tool |
| description | string | Human-readable description for the LLM |
| source_type | string | custom_function (simple webhook) or action (OpenAPI-derived) |
| webhook_url | string | Full URL for the HTTP request |
| method | string | HTTP method: GET, POST, PUT, PATCH, DELETE |
| headers | object | Static headers to include in the request |
| auth_type | string | Auth type: api_key, bearer, oauth2, basic, none |
| parameters_schema | object | JSON Schema for function parameters (sent to LLM) |
| path_template | string | URL path template with placeholders (e.g., /api/v1/orders/{order_id}) |
| query_schema | object | JSON Schema for query parameters (action-based tools) |
| rate_limit | object | Rate limit config: {"max_calls": 10, "window_seconds": 60} |
Action Sources
custom_function — Simple webhook: name, URL, method, and a parameters schema. Best for internal APIs.
action — OpenAPI-spec-derived: method, path template, body/query/path schemas, with auth resolved from connections.
Execution Security
Only actions linked to the agent are executable
Auth credentials loaded from database, never from LLM output
Webhook URLs validated (no private IPs in production)
15-second timeout per webhook call
Response size capped at 50KB
Automatic retry (up to 2 retries) for transient failures
Example: Tool Call Flow
Sequence
// 1. Customer asks about their order
Customer → "Where is my order #ORD-1234?"
// 2. Cortex sends tools to LLM with the message
LLM receives: message + tools: [get_order_status, track_shipment]
// 3. LLM decides to call get_order_status
LLM returns: tool_calls: [{
name: "get_order_status",
arguments: { order_id: "ORD-1234" }
}]
// 4. Cortex executes the webhook
HTTP GET → https://api.store.com/orders/ORD-1234
Response → { status: "shipped", eta: "2026-05-14" }
// 5. Result fed back to LLM
LLM receives: tool result + original context
// 6. LLM generates final response
LLM → "Your order #ORD-1234 has been shipped and will arrive by May 14."
Health & Observability
Monitor Cortex health, dependencies, and performance metrics.
GET
/health
Basic health check
Returns 200 if the Cortex process is alive. Used by container orchestration health checks. Does not test dependencies.
Response
Response 200
{
"status": "ok",
"version": "0.1.0"
}
GET
/health/detailed
Detailed dependency health check
Tests every downstream dependency: PostgreSQL (SELECT 1), Redis (PING), and LiteLLM (/health). Returns individual status for each.
Response
200 All checks passed
200 Degraded — some checks failed
Response 200
{
"status": "ok",
"version": "0.1.0",
"checks": {
"postgres": "ok",
"redis": "ok",
"litellm": "ok"
}
}
GET
/v1/metrics
Prometheus-format metrics
Exposes all Cortex metrics in Prometheus text format for Grafana scraping. Includes counters, histograms, and labels for model tier, agent ID, etc.
Response
text/plain
# HELP cortex_config_cache_hits_total Number of agent config cache hits
# TYPE cortex_config_cache_hits_total counter
cortex_config_cache_hits_total{agent_id="abc123"} 42
# HELP cortex_pipeline_duration_ms Pipeline processing time
# TYPE cortex_pipeline_duration_ms histogram
cortex_pipeline_duration_ms_sum{model="sonnet"} 15420
cortex_pipeline_duration_ms_count{model="sonnet"} 38
GET
/v1/metrics/dashboard
JSON metrics for dashboards
Structured JSON metrics for dashboard integration. Returns config loading performance, model routing stats, and agent compilation metrics.
Response
Response 200
{
"config_loading": {
"cache_hit_rate": 94.2,
"total_loads": 520,
"cache_hits": 490,
"avg_duration_ms": 8
},
"model_routing": {
"total_routed": 500,
"by_tier": {
"nano": 280,
"sonnet": 150,
"complex": 70
}
},
"agent_compilation": {
"avg_llm_latency_ms": 340,
"tool_call_rate": 18.5,
"avg_tool_rounds": 1.2
}
}
🔑 API Keys
Create and manage API keys for accessing Cortex endpoints. Keys are prefixed with
ctx_ and scoped to your company.
Loading keys...
Error Codes
Standard HTTP status codes used across all endpoints.
| Code | Status | Description |
|---|---|---|
| 200 | Success | Request processed successfully |
| 401 | Unauthorized | Missing, invalid, or expired API key |
| 404 | Not Found | Resource not found (agent, fact, etc.) |
| 422 | Validation Error | Invalid request body (Pydantic validation failed) |
| 429 | Rate Limited | Too many requests — retry after the specified interval |
| 500 | Internal Error | Unexpected pipeline error |
| 503 | Service Unavailable | LLM gateway unreachable |
| 504 | Gateway Timeout | LLM call timed out |