Hosted inference rollout is invite-first. Abuse-resistant keys, egress controls, and model allowlists ship with enterprise workspaces.

Docs
SDKs & clients

SDKs & clients

Python OpenAI SDK, TypeScript fetch, Go, and other OpenAI-compatible clients against Vocifer AI.

Point any OpenAI-compatible client at https://inf.vocifer.com/v1 (or your assigned host) and use your Vocifer API key. Set base_url / OPENAI_BASE_URL to that origin including /v1.

Python (OpenAI SDK)

import os
from openai import OpenAI
 
client = OpenAI(
    base_url="https://inf.vocifer.com/v1",
    api_key=os.environ["VOCIFER_API_KEY"],
    default_headers={
        "X-Vocifer-Organization-Id": os.environ.get("VOCIFER_ORG_ID", ""),
    },
)
 
resp = client.chat.completions.create(
    model="meta-llama/llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": "Hello from Python."}],
)
print(resp.choices[0].message.content)

Streaming: pass stream=True and iterate chunk.choices[0].delta.content.

TypeScript (fetch)

const res = await fetch("https://inf.vocifer.com/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.VOCIFER_API_KEY!}`,
    "Content-Type": "application/json",
    ...(process.env.VOCIFER_ORG_ID
      ? { "X-Vocifer-Organization-Id": process.env.VOCIFER_ORG_ID }
      : {}),
  },
  body: JSON.stringify({
    model: "meta-llama/llama-3.3-70b-instruct",
    messages: [{ role: "user", content: "Hi from Node." }],
  }),
});
const data = await res.json();
console.log(data.choices?.[0]?.message?.content);

Go (net/http)

package main
 
import (
	"bytes"
	"encoding/json"
	"net/http"
	"os"
)
 
func main() {
	body := map[string]any{
		"model": "meta-llama/llama-3.3-70b-instruct",
		"messages": []map[string]string{{"role": "user", "content": "Hi from Go."}},
	}
	b, _ := json.Marshal(body)
	req, _ := http.NewRequest("POST", "https://inf.vocifer.com/v1/chat/completions", bytes.NewReader(b))
	req.Header.Set("Authorization", "Bearer "+os.Getenv("VOCIFER_API_KEY"))
	req.Header.Set("Content-Type", "application/json")
	resp, err := http.DefaultClient.Do(req)
	if err != nil {
		panic(err)
	}
	defer resp.Body.Close()
	// decode JSON from resp.Body as needed
}

Other stacks

LangChain, LiteLLM, Vercel AI SDK, httpx, and similar tools work if they allow a custom base URL and Bearer auth for Chat Completions.

Previous: Chat completions. Back to overview.