I needed a fast OpenAI client for a realtime voice agent project. The official Python SDK is great, but I needed Rust — for WebSocket audio streaming, edge deployment to Cloudflare Workers, and sub-second latency in agentic loops with dozens of tool calls.
So I ported it. 259 commits, 5 days, 100+ API methods. The first day — 120 commits — was mostly Claude Code translating types from Python to Rust while I set up pre-commit hooks, WASM checks, and benchmarks. The rest was architecture decisions, performance tuning, and Node/Python bindings.
The result: openai-oxide — a Rust client that matches the official Python SDK's API surface while being faster and deployable to WASM.
Why Not Just Use What Exists?
The official Python and Node SDKs are solid — they reuse HTTP/2 connections, have WebSocket support for the Realtime API, and cover all endpoints. But they don't compile to WASM, and their WebSocket mode is only for the Realtime API (audio/multimodal), not for regular text-based Responses API calls.
In the Rust ecosystem, you pick async-openai for types or genai for multi-provider support — but no single crate gives you persistent WebSocket sessions for the Responses API, structured outputs with auto-generated schemas, stream helpers, and WASM deployment in one package.
For an agentic loop where the model calls read_file, search_code, edit_file, run_tests in sequence — you want all of this together. That's what we built.
Persistent WebSockets
The biggest win: keep one wss:// connection open for the entire agent cycle.
let mut session = client.ws_session().await?;
// 50 tool calls — zero TLS overhead after the first
for _ in 0..50
let response = session.send(request).await?;
// execute tool, feed result back
session.close().await?;
Benchmark: 10 sequential tool calls complete 40% faster than HTTP REST on the same model.
Structured Outputs Without Boilerplate
Every Rust OpenAI client supports response_format: json_schema. But you have to build the schema by hand:
// Other clients: manual schema construction
let schema = json!(
"type": "object",
"properties":
"answer": "type": "string",
"confidence": "type": "number"
,
"required": ["answer", "confidence"],
"additionalProperties": false
);
With openai-oxide, derive the schema from your types:
#[derive(Deserialize, JsonSchema)]
struct Answer
answer: String,
confidence: f64,
let result = client.chat().completions()
.parse::(request).await?;
println!("", result.parsed.unwrap().answer);
One derive, both directions — the same #[derive(JsonSchema)] generates response schemas and tool parameter definitions. No manual JSON, no drift between types and schemas.
Zero-Copy SSE Streaming
Time-to-first-token matters for UX. Our SSE parser avoids intermediate allocations and sets anti-buffering headers that prevent reverse proxies from holding back chunks:
Accept: text/event-stream
Cache-Control: no-cache
Without these, Cloudflare and nginx buffer streaming responses, adding 50-200ms to TTFT. With them: 530ms TTFT on gpt-5.4.
Stream Helpers
Raw SSE chunks require manual stitching — tracking content deltas, assembling tool call arguments by index, detecting completion. We provide typed events:
let mut stream = client.chat().completions()
.create_stream_helper(request).await?;
while let Some(event) = stream.next().await
match event?
ChatStreamEvent::ContentDelta delta, snapshot =>
print!("delta"); // snapshot has full text so far
ChatStreamEvent::ToolCallDone name, arguments, .. =>
execute_tool(&name, &arguments).await;
_ =>
Or just get the final result: stream.get_final_completion().await?
WASM Support
The entire client compiles to wasm32-unknown-unknown and runs in Cloudflare Workers:
[dependencies]
openai-oxide = version = "0.9", default-features = false, features = ["chat", "responses"]
worker = "0.7"
Streaming, structured outputs, retry logic — all work in WASM. Live demo.
HTTP Optimizations That Nobody Else Does
We checked — neither async-openai nor genai enable these by default:
Optimization
Impact
gzip compression
~30% smaller responses
TCP_NODELAY
Lower latency (disables Nagle)
HTTP/2 keep-alive (20s ping)
Prevents idle connection drops
HTTP/2 adaptive window
Auto-tunes flow control
Connection pool (4/host)
Better parallel performance
These are all standard reqwest builder options. Source.
Benchmarks
Median of 3 runs, 5 iterations each, gpt-5.4:
Rust (Responses API)
Test
openai-oxide
async-openai
genai
Streaming TTFT
645ms
685ms
670ms
Function calling
1192ms
1748ms
1030ms
WebSocket plain text
710ms
N/A
N/A
Node.js — oxide wins 8/8
Test
openai-
Tags:
#0
Want to run a more efficient business?
Mewayz gives you CRM, HR, Accounting, Projects & eCommerce — all in one workspace. 14-day free trial, no credit card needed.
Try Mewayz Free →