Cortex API Documentation

https://cortex.doo.ooo
Cortex API
Cortex is a standalone AI inference engine. It processes messages through an 8-service pipeline that handles agent configuration, customer memory, RAG knowledge retrieval, token optimization, semantic caching, intelligent model routing, prompt compilation with tool-call execution, and post-response actions — all in a single API call.
Base URL
/v1
All endpoints versioned
Auth
Bearer Token
Format
JSON
Request & Response
Pipeline
8 Services
Sequential processing
Multi-model routing (GPT-4o, Claude Sonnet, Gemini Flash)
RAG knowledge retrieval with pgvector
Customer memory (cross-session facts)
Semantic caching for instant responses
Tool-call loop with webhook execution
Token optimization & context compression
Per-step latency telemetry
Prometheus + OpenTelemetry observability
🔑 Authentication
All /v1 endpoints require an API key passed via the Authorization header. API keys are prefixed with ctx_ and can be created through the developer portal.
Authorization Header
Include your API key as a Bearer token in every request to /v1/* endpoints. Keys without the ctx_ prefix or expired keys will receive a 401 Unauthorized response.
HTTP Header
Authorization: Bearer ctx_your_api_key_here
POST /portal/api/login Get a JWT session token
Authenticate with your email and password to receive a JWT session token. Use this token to manage API keys via the portal endpoints.
Request Body
FieldTypeRequiredDescription
emailstringRequiredYour account email address
passwordstringRequiredYour account password
Response
200 Success — JWT token returned 401 Invalid credentials
Response
{ "token": "eyJhbGciOiJIUzI1NiIs...", "user": { "id": "uuid", "email": "you@company.com", "company_id": "uuid" } }
POST /portal/api/keys Create a new API key
Create a new API key for your company. The raw key is shown only once — store it securely. Requires JWT session token from /portal/api/login.
Request Body
FieldTypeRequiredDescription
namestringRequiredHuman-readable label for the key (1-100 chars)
scopesstring[]OptionalPermission scopes (default: all)
expires_in_daysintegerOptionalKey expiry in days (1-365, null = never)
Response
Response
{ "key": "ctx_abc123...xyz", "key_prefix": "ctx_abc1", "name": "Production Key", "scopes": ["*"], "expires_at": "2026-06-12T00:00:00Z", "created_at": "2026-05-12T10:00:00Z" }
⚡ Pipeline Architecture
Every message flows through an 8-service pipeline. Each service is independent, testable, and adds specific capabilities to the request context. If the semantic cache hits, services 06-08 are skipped entirely for instant zero-cost responses.
01 Config
02 Memory
03 RAG
04 Optimize
05 Cache
06 Router
07 Compile + Tools
08 Actions
ServiceNameDescription
01Config LoaderLoads agent configuration from database (personality, dialect, prompts, actions, knowledge sources). Cached in Redis for 5 minutes.
02Memory RecallRetrieves customer-specific facts and knowledge from the memory system. Cross-session memory enables personalization.
03Context EngineRAG retrieval — searches pgvector for relevant knowledge chunks using the customer's query embedding.
04Token OptimizerCompresses context to fit within the model's token budget. Priority-based — core identity is never compressed.
05Cache CheckSemantic similarity search in Redis. If a sufficiently similar query was answered before, returns the cached response (zero LLM cost).
06Model RouterClassifies query complexity and routes to the optimal model tier (speed/nano/sonnet/complex/search).
07Agent CompilerRenders the Jinja2 prompt template, sends to LLM via LiteLLM. If the LLM returns tool_calls, executes them and re-calls the LLM (up to 5 rounds).
08Post ActionsExecutes any remaining post-response actions after the LLM has produced its final text response.
Chat
The core endpoint. Send a message and get an AI response processed through the full 8-service pipeline.
POST /v1/chat Process a message through the AI pipeline
Send a customer message and receive an AI-generated response. The pipeline loads the agent's configuration, recalls customer memory, retrieves relevant knowledge via RAG, optimizes the token budget, checks the semantic cache, routes to the optimal model, compiles the prompt, and executes any tool calls — all in one request.
Request Body
FieldTypeRequiredDescription
company_idstringRequiredUUID — your company/tenant identifier
agent_idstringRequiredUUID — which AI agent to use for processing
conversation_idstringRequiredUUID — conversation thread identifier
customer_idstringRequiredCustomer identifier for memory recall
messagestringRequiredThe customer's message text (1-16,000 chars)
channelstringOptionalChannel type: whatsapp, web, instagram, email, etc. Default: whatsapp
metadataobjectOptionalExtra context (language, location, etc.). Max 20 keys.
Example Request
curl
curl -X POST https://cortex.doo.ooo/v1/chat \ -H "Authorization: Bearer ctx_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "company_id": "550e8400-e29b-41d4-a716-446655440000", "agent_id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "conversation_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7", "customer_id": "cust_12345", "message": "What are your business hours?", "channel": "whatsapp" }'
Response
200 Success 401 Unauthorized 422 Validation error 500 Pipeline error 503 LLM gateway unreachable
Response 200
{ "reply": "Our business hours are Sunday to Thursday, 9 AM to 6 PM.", "model_used": "nano", "cached": false, "actions_executed": [], "latency_ms": 342, "tokens": { "input": 1250, "output": 28, "cost_usd": 0.00019 }, "context_tokens": { "before_compression": 3200, "after_compression": 1250, "savings_pct": 60.9 }, "step_timings": { "config_load": 12, "memory_recall": 45, "context_engine": 38, "token_optimizer": 5, "cache_check": 8, "model_router": 2, "agent_compiler": 228, "tool_call_rounds": 0, "tool_execution": 0 } }
Response with Tool Calls Executed
When the agent has configured actions/tools, the LLM may invoke them during processing. Tool calls are executed as HTTP webhooks and the results are fed back to the LLM automatically (up to 5 rounds).
Response 200 — with actions
{ "reply": "Your order #ORD-1234 is currently being prepared and will ship tomorrow.", "model_used": "sonnet", "cached": false, "actions_executed": ["get_order_status"], "latency_ms": 1580, "tokens": { "input": 2100, "output": 45, "cost_usd": 0.0032 }, "step_timings": { "agent_compiler": 1420, "tool_call_rounds": 1, "tool_execution": 380 } }
Agents
Inspect an agent's full configuration as loaded by the pipeline.
GET /v1/agents/{agent_id}/config Get agent configuration
Returns the full agent configuration as loaded from the database, including personality, dialect, system prompt, FAQs, scenarios, knowledge sources, action definitions, and all settings. Result is cached in Redis for 5 minutes (same cache the pipeline uses).
Path Parameters
FieldTypeRequiredDescription
agent_idstringRequiredUUID of the agent
Response
200 Full agent config 404 Agent not found
Response 200
{ "id": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "company_id": "550e8400-e29b-41d4-a716-446655440000", "name": "Support Agent", "personality": { "id": "uuid", "name": "Friendly Professional" }, "dialect": { "name": "Egyptian Arabic", "primary_language": "ar" }, "action_definitions": [ { "id": "uuid", "name": "get_order_status", "description": "Look up an order by ID", "source_type": "custom_function", "method": "GET", "parameters_schema": { "..." } } ], "knowledge_source_ids": ["uuid1", "uuid2"], "workflow_faqs": ["..."] }
Knowledge Ingestion
Upload documents and URLs to build the agent's knowledge base. The ingestion pipeline converts content to markdown, splits into semantic chunks, generates vector embeddings, and stores them for RAG retrieval.
POST /v1/knowledge/ingest/file Ingest a file (PDF, DOCX, PPTX, etc.)
Converts a file to Markdown, splits into semantic chunks, generates vector embeddings, and stores in the knowledge base. Supports PDF, DOCX, PPTX, XLSX, and other document formats via MarkItDown.
Request Body
FieldTypeRequiredDescription
source_idstringRequiredUUID — knowledge source record identifier
file_pathstringRequiredPath to the uploaded file on disk
user_idstringRequiredUUID — who uploaded the file
embedding_modelstringOptionalEmbedding model: bge-m3 (default), openai, or cohere
Response
200 Ingestion result 500 Conversion or embedding error
Response 200
{ "source_id": "uuid", "status": "completed", "chunks_created": 42, "chunks_embedded": 42, "error": null }
POST /v1/knowledge/ingest/url Ingest a web page URL
Crawls a web page, extracts text content, splits into chunks, embeds, and stores for RAG retrieval.
Request Body
FieldTypeRequiredDescription
source_idstringRequiredUUID — knowledge source record identifier
urlstringRequiredWeb page URL to crawl and ingest
user_idstringRequiredUUID — who triggered the ingestion
embedding_modelstringOptionalEmbedding model: bge-m3 (default), openai, or cohere
Response
Response 200
{ "source_id": "uuid", "status": "completed", "chunks_created": 18, "chunks_embedded": 18, "error": null }
POST /v1/knowledge/ingest/documents Ingest structured knowledge documents
Embeds an agent's structured knowledge documents into the RAG pipeline for semantic search retrieval.
Request Body
FieldTypeRequiredDescription
agent_idstringRequiredUUID — the agent whose documents to ingest
company_idstringRequiredUUID — company for scoping
user_idstringRequiredUUID — who triggered the ingestion
embedding_modelstringOptionalEmbedding model: bge-m3 (default), openai, or cohere
Response
Response 200
{ "agent_id": "uuid", "status": "completed", "documents_processed": 8, "chunks_created": 24, "chunks_embedded": 24 }
POST /v1/knowledge/reembed Re-embed existing chunks with a different model
Re-generates embeddings for existing knowledge chunks using a different embedding model. Useful when switching between embedding providers.
Request Body
FieldTypeRequiredDescription
source_idstringRequiredUUID — knowledge source to re-embed
embedding_modelstringOptionalNew embedding model: bge-m3 (default), openai, or cohere
Response
Response 200
{ "source_id": "uuid", "status": "completed", "chunks_embedded": 42, "error": null }
Memory
Manage customer memory facts. Cortex automatically extracts and stores facts during conversations. These endpoints let you query, save, and delete facts directly.
POST /v1/memory/{customer_id} Query customer memory
Retrieve all stored facts and knowledge about a specific customer. Returns memory facts, knowledge snippets, and knowledge base entries.
Path Parameters
FieldTypeRequiredDescription
customer_idstringRequiredCustomer identifier
Request Body
FieldTypeRequiredDescription
company_idstringRequiredUUID — company/tenant scope
querystringOptionalFilter query for memory facts
Response
Response 200
{ "customer_id": "cust_12345", "company_id": "uuid", "fact_count": 5, "knowledge_doc_count": 2, "kb_entry_count": 3, "facts": [ { "id": "uuid", "fact": "Prefers Arabic language", "category": "preference", "confidence": 0.95 } ], "knowledge_snippets": ["..."], "kb_entries": ["..."] }
POST /v1/memory/{customer_id}/save Save a memory fact
Manually save a fact about a customer. Facts are used by Service 02 (Memory Recall) to personalize responses.
Path Parameters
FieldTypeRequiredDescription
customer_idstringRequiredCustomer identifier
Request Body
FieldTypeRequiredDescription
company_idstringRequiredUUID — company/tenant scope
factstringRequiredThe fact text (e.g., "Prefers Arabic language")
categorystringOptionalCategory: preference, history, sentiment, context, general. Default: general
confidencefloatOptionalConfidence score 0.0-1.0. Default: 1.0
sourcestringOptionalFact origin. Default: manual
agent_idstringOptionalScope fact to a specific agent
Response
Response 200
{ "customer_id": "cust_12345", "fact_id": "uuid", "saved": true }
DELETE /v1/memory/fact/{fact_id} Delete a memory fact
Remove a specific memory fact by its ID.
Path Parameters
FieldTypeRequiredDescription
fact_idstringRequiredUUID of the fact to delete
Response
Response 200
{ "fact_id": "uuid", "deleted": true }
Semantic Cache
Pre-warm and manage the semantic cache. Cached responses are returned instantly without hitting the LLM, giving zero-cost sub-millisecond responses for frequently asked questions.
POST /v1/cache/warm Pre-warm cache with FAQ pairs
Store question-answer pairs in the semantic cache. When a customer asks a semantically similar question, the cached answer is returned instantly.
Request Body
FieldTypeRequiredDescription
company_idstringRequiredUUID — company/tenant scope
agent_idstringRequiredUUID — agent to cache for
pairsobject[]RequiredArray of {"question": "...", "answer": "..."} pairs
Example Request
JSON
{ "company_id": "uuid", "agent_id": "uuid", "pairs": [ { "question": "What are your business hours?", "answer": "We are open Sunday to Thursday, 9 AM to 6 PM." }, { "question": "Where are you located?", "answer": "Our office is at 123 Business St, Cairo." } ] }
Response
Response 200
{ "status": "completed", "pairs_stored": 2, "pairs_errored": 0, "pairs_total": 2 }
DELETE /v1/cache/flush Flush cached entries
Clear cached responses. Optionally scoped to a specific company and/or agent.
Query Parameters
FieldTypeRequiredDescription
company_idstringOptionalScope flush to a specific company
agent_idstringOptionalScope flush to a specific agent
Response
Response 200
{ "status": "flushed", "entries_deleted": 15 }
GET /v1/cache/stats Get cache index stats
Returns statistics about the semantic cache index — document count, record count, and indexing status.
Response
Response 200
{ "status": "active", "index_name": "cortex_cache", "num_docs": 156, "num_records": 156, "indexing": "complete" }
🔧 Actions & Tool Calls
Cortex agents can execute external actions (HTTP webhooks) during message processing. When the LLM determines it needs data from an external system, it triggers tool calls that Cortex executes automatically.
How Tool Calls Work
Actions are configured per-agent as ActionDefinitions. During Service 07 (Agent Compiler), these are converted to OpenAI-compatible tool definitions and sent to the LLM. When the LLM returns tool_calls, Cortex executes them as HTTP requests and feeds the results back to the LLM — up to 5 rounds until the LLM produces a final text response.
LLM Call
tool_calls?
Execute Webhooks
Feed Results to LLM
Text Response
Repeats up to 5 rounds. If max rounds reached, LLM is forced to produce a text response.
Action Definition Schema
Each action configured on an agent includes these fields:
FieldTypeDescription
idstringUUID — unique action identifier
namestringFunction name the LLM uses to call this tool
descriptionstringHuman-readable description for the LLM
source_typestringcustom_function (simple webhook) or action (OpenAPI-derived)
webhook_urlstringFull URL for the HTTP request
methodstringHTTP method: GET, POST, PUT, PATCH, DELETE
headersobjectStatic headers to include in the request
auth_typestringAuth type: api_key, bearer, oauth2, basic, none
parameters_schemaobjectJSON Schema for function parameters (sent to LLM)
path_templatestringURL path template with placeholders (e.g., /api/v1/orders/{order_id})
query_schemaobjectJSON Schema for query parameters (action-based tools)
rate_limitobjectRate limit config: {"max_calls": 10, "window_seconds": 60}
Action Sources
custom_function — Simple webhook: name, URL, method, and a parameters schema. Best for internal APIs.
action — OpenAPI-spec-derived: method, path template, body/query/path schemas, with auth resolved from connections.
Execution Security
Only actions linked to the agent are executable
Auth credentials loaded from database, never from LLM output
Webhook URLs validated (no private IPs in production)
15-second timeout per webhook call
Response size capped at 50KB
Automatic retry (up to 2 retries) for transient failures
Example: Tool Call Flow
Sequence
// 1. Customer asks about their order Customer → "Where is my order #ORD-1234?" // 2. Cortex sends tools to LLM with the message LLM receives: message + tools: [get_order_status, track_shipment] // 3. LLM decides to call get_order_status LLM returns: tool_calls: [{ name: "get_order_status", arguments: { order_id: "ORD-1234" } }] // 4. Cortex executes the webhook HTTP GET → https://api.store.com/orders/ORD-1234 Response → { status: "shipped", eta: "2026-05-14" } // 5. Result fed back to LLM LLM receives: tool result + original context // 6. LLM generates final response LLM → "Your order #ORD-1234 has been shipped and will arrive by May 14."
Health & Observability
Monitor Cortex health, dependencies, and performance metrics.
GET /health Basic health check
Returns 200 if the Cortex process is alive. Used by container orchestration health checks. Does not test dependencies.
Response
Response 200
{ "status": "ok", "version": "0.1.0" }
GET /health/detailed Detailed dependency health check
Tests every downstream dependency: PostgreSQL (SELECT 1), Redis (PING), and LiteLLM (/health). Returns individual status for each.
Response
200 All checks passed 200 Degraded — some checks failed
Response 200
{ "status": "ok", "version": "0.1.0", "checks": { "postgres": "ok", "redis": "ok", "litellm": "ok" } }
GET /v1/metrics Prometheus-format metrics
Exposes all Cortex metrics in Prometheus text format for Grafana scraping. Includes counters, histograms, and labels for model tier, agent ID, etc.
Response
text/plain
# HELP cortex_config_cache_hits_total Number of agent config cache hits # TYPE cortex_config_cache_hits_total counter cortex_config_cache_hits_total{agent_id="abc123"} 42 # HELP cortex_pipeline_duration_ms Pipeline processing time # TYPE cortex_pipeline_duration_ms histogram cortex_pipeline_duration_ms_sum{model="sonnet"} 15420 cortex_pipeline_duration_ms_count{model="sonnet"} 38
GET /v1/metrics/dashboard JSON metrics for dashboards
Structured JSON metrics for dashboard integration. Returns config loading performance, model routing stats, and agent compilation metrics.
Response
Response 200
{ "config_loading": { "cache_hit_rate": 94.2, "total_loads": 520, "cache_hits": 490, "avg_duration_ms": 8 }, "model_routing": { "total_routed": 500, "by_tier": { "nano": 280, "sonnet": 150, "complex": 70 } }, "agent_compilation": { "avg_llm_latency_ms": 340, "tool_call_rate": 18.5, "avg_tool_rounds": 1.2 } }
🔑 API Keys
Create and manage API keys for accessing Cortex endpoints. Keys are prefixed with ctx_ and scoped to your company.
New API Key Created — Copy it now, it won't be shown again!
⚠ Store this key securely. You won't be able to see it again.
Loading keys...
Error Codes
Standard HTTP status codes used across all endpoints.
CodeStatusDescription
200SuccessRequest processed successfully
401UnauthorizedMissing, invalid, or expired API key
404Not FoundResource not found (agent, fact, etc.)
422Validation ErrorInvalid request body (Pydantic validation failed)
429Rate LimitedToo many requests — retry after the specified interval
500Internal ErrorUnexpected pipeline error
503Service UnavailableLLM gateway unreachable
504Gateway TimeoutLLM call timed out