Docs
Chat completions
Chat completions
Call POST /v1/chat/completions with curl — non-streaming and Server-Sent Events streaming.
Path: POST /v1/chat/completions. Body matches OpenAI’s chat schema: model, messages, optional temperature, max_tokens, stream, etc.
Non-streaming
curl -sS "https://inf.vocifer.com/v1/chat/completions" \
-H "Authorization: Bearer $VOCIFER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/llama-3.3-70b-instruct",
"messages": [
{"role": "user", "content": "Reply in one sentence: what is KV cache?"}
],
"temperature": 0.2,
"max_tokens": 256
}'Streaming (SSE)
Set stream": true and accept text/event-stream. Chunk framing follows the same SSE grammar OpenAI-compatible clients expect.
curl -sS "https://inf.vocifer.com/v1/chat/completions" \
-H "Authorization: Bearer $VOCIFER_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"model": "meta-llama/llama-3.3-70b-instruct",
"messages": [{"role": "user", "content": "Stream a haiku about GPUs."}],
"stream": true
}'Previous: Models catalog. Next: SDKs & clients.