Output Pacing
Let your model track and budget its own output time against an authoritative clock. You stream tokens; temporalBLOCK supplies the when — Spiral-anchored elapsed time, a smoothed tokens-per-second rate, and a projected finish — so a long generation can pace itself, warn before it overruns a deadline, or trim its remaining work to fit a time budget.
Your runtime already knows how many tokens it has emitted — so you report them. What temporalBLOCK supplies is the one thing your host can't measure honestly: authoritative elapsed time, anchored to the Spiral clock (NIST / NTP) rather than a drifting local wall clock. From your count and our clock it derives the rate, the projection, and the budget. The math is shared and auditable; the token total stays in your stack, not ours.
Live sessions — POST /v1/pacing/sessions, .../ticks, .../close — are a Standard+ feature. Lite keys get HTTP 402 PACING_SESSIONS_REQUIRE_PAID_TIER. The stateless one-shot POST /v1/pacing/estimate is available on every tier, like calibration. Pacing is rate-limited, not metered: it never counts against your snippet / briefing / deep quota and never appears on a PAYG invoice.
How it works
There are two flows. A live session is the full loop: open() a session, call tick() with your running token total as the generation streams, then close() it. Each tick returns the authoritative elapsed time, a smoothed rate with a confidence band, and — if you pass a targetTokens or a budgetSeconds — a projected finish or a max-tokens budget. A stateless estimate skips the session entirely: hand estimate() a target (or a time budget) plus a known or recently-observed rate and it answers in one call.
Pace a streaming generation
Open once, tick as tokens flow, close at the end. You supply the cumulative token count; the server supplies the clock.
import { TemporalBlock } from "temporalblock";
const tb = new TemporalBlock({ apiKey: process.env.TBLK_KEY! });
// 1. Open a session (Standard+). Scope it to a model so the
// recent-rate store and calibration fold stay per-model.
const session = await tb.pacing.open({ model: "gpt-4o" });
let cumulativeTokens = 0;
for await (const chunk of stream) {
cumulativeTokens += chunk.tokenCount; // YOUR count, not ours.
// 2. Tick with the running total + an optional target. The
// response carries authoritative elapsed time, a smoothed
// rate, and a projected finish.
const snap = await tb.pacing.tick(session.sessionId, {
cumulativeTokens,
targetTokens: 1200,
});
if (snap.projection && snap.projection.additionalSeconds !== null) {
// e.g. trim remaining work if the finish would overrun a deadline.
}
}
// 3. Close to record the final measured rate for the session.
const summary = await tb.pacing.close(session.sessionId, {
cumulativeTokens,
});
// summary.durationMs, summary.measuredRate, summary.anchorSourceEvery tick reports its anchor.source — "spiral" when the authoritative anchor is live, "wall" when it has fallen back to the host clock — so a degraded anchor is honest, never hidden.
One-shot, every tier
No session, no state, available on Lite. Ask either direction — tokens to time, or a time budget to a token ceiling. Omit rateTokensPerSec and the server consults the recent-rate store for that model.
// Tokens → time: "how long will 1,200 tokens take at ~40 tok/s?"
const eta = await tb.pacing.estimate({
targetTokens: 1200,
rateTokensPerSec: 40,
});
// eta.mode === "tokensToTime"; eta.additionalSeconds
// Time → tokens: "how many tokens fit in a 20s budget?"
const fit = await tb.pacing.estimate({
budgetSeconds: 20,
model: "gpt-4o", // rate pulled from the recent-rate store
});
// fit.mode === "secondsToTokens"; fit.totalTokensTemporal Spiral is patent pending — U.S. Provisional Application No. 64/065,213 (filed 2026-05-14).