Question 1

Can we use Vocifer today without the confidential stack?

Accepted Answer

Yes. We already host models on standard production infrastructure with the same OpenAI-compatible APIs, catalog pricing, and per-organizationId metering. The hardware confidential VM + full attestation chain (including Rekor-backed transparency) is a separate track in active development—you can ship on the standard tier now and opt into confidential preview when it matches your security review timeline.

Question 2

What do you mean by confidential, attested inference?

Accepted Answer

Each serving instance is meant to run inside a hardware confidential VM (AMD SEV-SNP or Intel TDX) so the cloud operator’s hypervisor cannot read guest memory in the threat model those technologies provide. Boot, host, guest, container, and accelerator layers publish measurable evidence; critical statements can be recorded to an append-only transparency log (Sigstore Rekor) so compliance and security teams get tamper-evident history—not just a vendor PDF. If an expected hash or policy does not match, automation recycles the node before it serves customer traffic. Exact policies and SLAs are spelled out during design-partner onboarding.

Question 3

How do you verify GPUs, not just the CPU TEE?

Accepted Answer

Modern NVIDIA datacenter GPUs (Hopper-class and newer confidential-compute modes) expose device attestation artifacts that can be combined with the CPU TEE quote in a composite workflow—so you get evidence about both the confidentiality boundary on the host and the accelerator trust domain. We integrate those checks into the same startup gate as the rest of the chain rather than treating GPUs as an unaudited peripheral.

Question 4

Is your API compatible with existing OpenAI clients?

Accepted Answer

Yes. Vocifer AI exposes Chat Completions–style HTTPS endpoints so you can point popular SDKs and frameworks at our base URL with minimal changes. Streaming, tool-call shaped messages, and JSON-mode style workflows map cleanly when the backing model supports the capability.

Question 5

How do clients discover models and list prices?

Accepted Answer

We maintain a canonical models listing endpoint that emits machine-readable SKU metadata: identifiers, modality coverage, context and output bounds, deterministic pricing encoded as USD strings per token unit, sampling knobs, and advertised features such as JSON mode or tool use. Routers and your own automation can poll that catalog alongside health signals so cost quotes stay aligned with what actually serves traffic.

Question 6

Why publish pricing as string decimals?

Accepted Answer

Token accounting spans many orders of magnitude. Shipping prices as decimal strings avoids floating-point drift across languages and makes it trivial for routers to compare providers without normalization bugs.

Question 7

How do you handle tail latency at scale?

Accepted Answer

We combine continuous batching, capacity-aware routing, KV-cache friendly deployment shapes, and back-pressure aware load shedding. Enterprise plans add dedicated concurrency pools so bursty workloads do not contend with noisy neighbors.

Question 8

What about data governance?

Accepted Answer

We offer contractual terms for retention windows, VPC or private interconnect delivery, customer-managed keys upon request, and audit-friendly logging hooks. Exact controls depend on tier; your solutions engineer enumerates controls during onboarding.

Confidential compute.
For everyone.

A full-stack attestation chain

Hosted Frontier SKUs

Same paths you already automate

Trust Pillars

Transparent token economics

Engineering FAQ

Confidential compute.For everyone.