Hosted inference rollout is invite-first. Abuse-resistant keys, egress controls, and model allowlists ship with enterprise workspaces.

Docs
Chat completions

Chat completions

Call POST /v1/chat/completions with curl — non-streaming and Server-Sent Events streaming.

Path: POST /v1/chat/completions. Body matches OpenAI’s chat schema: model, messages, optional temperature, max_tokens, stream, etc.

Non-streaming

curl -sS "https://inf.vocifer.com/v1/chat/completions" \
  -H "Authorization: Bearer $VOCIFER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.3-70b-instruct",
    "messages": [
      {"role": "user", "content": "Reply in one sentence: what is KV cache?"}
    ],
    "temperature": 0.2,
    "max_tokens": 256
  }'

Streaming (SSE)

Set stream": true and accept text/event-stream. Chunk framing follows the same SSE grammar OpenAI-compatible clients expect.

curl -sS "https://inf.vocifer.com/v1/chat/completions" \
  -H "Authorization: Bearer $VOCIFER_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{
    "model": "meta-llama/llama-3.3-70b-instruct",
    "messages": [{"role": "user", "content": "Stream a haiku about GPUs."}],
    "stream": true
  }'

Previous: Models catalog. Next: SDKs & clients.