Reliability: retries, timeouts, and cancellation
The SDK is built to survive transient failures — dropped connections, rate limits, momentary server hiccups — without you writing retry loops. At the same time it never silently re-sends a request the server may have already processed, because doing so could double-create a sandbox or double-charge bandwidth.
This document explains how the transport makes those guarantees, and how you tune or override them.
At a glance
- Package:
@nodeops-createos/sandbox(npm) - Import:
import { createClient } from "@nodeops-createos/sandbox" - Base URL:
https://api.sb.createos.sh— override withCREATEOS_SANDBOX_BASE_URL - Auth: API key via the
apiKeyoption orCREATEOS_SANDBOX_API_KEY
Idempotent vs. non-idempotent requests
The key question when deciding whether to retry is: if the server already processed this request, will retrying cause harm?
Idempotent methods — GET, HEAD, PUT, DELETE — answer no. A
repeated GET returns the same data; a repeated DELETE on a resource already
deleted is still a no-op (or 404, which is handled). The SDK therefore retries
these methods freely on both network failures and the set of server statuses
that unambiguously signal transient trouble:
| Trigger | Idempotent (GET HEAD PUT DELETE) | Non-idempotent (POST PATCH) |
|---|---|---|
| Network error (DNS, connection refused, socket reset) | Retried | Not retried |
408 Request Timeout | Retried | Not retried |
500 Internal Server Error | Retried | Not retried |
502 Bad Gateway | Retried | Not retried |
503 Service Unavailable | Retried | Retried |
429 Too Many Requests | Retried | Retried |
504 Gateway Timeout | Retried | Not retried |
Non-idempotent methods — POST, PATCH — are dangerous to retry unless
the server provably did not act on the request. Two statuses carry that
guarantee: 429 (rate-limit; server rejected the request before processing it)
and 503 (service unavailable; the upstream never reached a handler). Every
other failure on a POST or PATCH is yours to decide what to do with.
Requests with a streaming (ReadableStream) body are also never retried,
because the stream is consumed on the first attempt and cannot be replayed.
Backoff and jitter
Between retry attempts the SDK sleeps a computed delay. The formula
(from src/http.ts, backoffDelay):
delay = min(
baseDelayMs × 2^attempt + Math.random() × baseDelayMs,
maxDelayMs
)
With defaults (baseDelayMs = 500 ms, maxDelayMs = 30 000 ms):
| Attempt | Deterministic term | Jitter range | Approximate ceiling |
|---|---|---|---|
| 0 (first retry) | 500 ms | 0–500 ms | 1 000 ms |
| 1 (second retry) | 1 000 ms | 0–500 ms | 1 500 ms |
The cap (maxDelayMs = 30 s) applies to the sum of the deterministic term
and the random jitter, so the result never exceeds 30 s regardless of how many
attempts are made.
Why jitter? When many clients hit the same transient error at the same time (a server restart, a brief overload), pure exponential backoff would cause them all to retry in synchronized waves — the thundering-herd problem. Adding a per-attempt random offset spreads retries across a window, smoothing the load spike.
Defaults and overrides
| Parameter | Default | Override scope |
|---|---|---|
maxRetries | 2 (3 total attempts) | Client or per-request |
baseDelayMs | 500 ms | Client or per-request |
maxDelayMs | 30 000 ms | Client or per-request |
Set a client-wide policy in the constructor:
TypeScript1const client = new CreateosSandboxClient({2 retry: { maxRetries: 4, baseDelayMs: 250 },3});
Override for a single call, or disable retries entirely for one call:
TypeScript1// Disable retries for one call (fast feedback on failures)2await client.whoami({ retry: false });34// Override max retries for one call5await client.listSandboxes({ retry: { maxRetries: 1 } });
Per-request options are merged over the client default, not replaced — a
per-request { maxRetries: 1 } still inherits baseDelayMs and maxDelayMs
from the client policy.
Retry-After header
When the server returns 429 or 503 with a Retry-After header, the SDK
uses that value instead of its own computed backoff:
delay = Retry-After value in seconds × 1000 ms
Both delta-seconds (Retry-After: 5) and HTTP-date formats are parsed. The
server is telling you exactly when it will accept the next request; overriding
with a shorter client-side backoff would just produce another 429.
After the Retry-After delay has elapsed, the SDK retries the request
normally. If the server repeats the 429, the Retry-After delay is honored
again, until maxRetries is exhausted.
When the SDK surfaces a CreateosSandboxRateLimitError after retries are
exhausted, err.retryAfterSeconds carries the last parsed Retry-After
value so your code can make its own scheduling decision.
Streaming is never retried
The stream method (src/http.ts) issues a single dispatch and yields
frames from the response body. There is no retry loop around it.
The reason is fundamental: by the time a streaming error surfaces, the iterator may have already yielded dozens of frames. There is no safe replay position. Restarting from the beginning would duplicate output; seeking to an offset is not possible without server-side support, and the control plane does not provide it.
If the underlying connection breaks, the async iterator throws and the
for await loop unwinds. Wrap the loop in application-level logic if you need
restart behavior.
Timeouts
Per-request timeout
Every request carries a timeout. The default is 60 000 ms (60 s), set
as DEFAULT_TIMEOUT_MS in src/config.ts. Override it at the client level or
per call:
TypeScript1// Client-wide — all requests time out after 10 s unless overridden2const client = new CreateosSandboxClient({ timeoutMs: 10_000 });34// Per-call — only this request gets 120 s5await client.createSandbox(req, { timeoutMs: 120_000 });67// Disable timeout for one call — use with care8await client.someMethod({ timeoutMs: 0 });
A timeoutMs: 0 disables the per-request timeout entirely. The caller is
then responsible for bounding the request's duration (e.g. via an
AbortSignal).
The timeout is applied per dispatch attempt, not across the entire retry
sequence. Each attempt gets a fresh 60 s budget. A slow server that returns
503 on attempt 0 after 59 s, waits for backoff, then times out on attempt 1
after another 60 s has consumed roughly 2 minutes total.
When the timeout elapses, the SDK throws CreateosSandboxTimeoutError with a
message like Request timed out 60000ms: GET /v1/sandboxes.
Wait timeout
createSandbox and the waitUntil* lifecycle helpers (waitUntilRunning,
waitUntilStopped) run a poll loop on a separate budget — 120 000 ms
(120 s) by default (DEFAULT_WAIT_MS). Pass waitTimeoutMs to change it:
TypeScript1const sandbox = await client.createSandbox(2 { shape: "s-4vcpu-4gb" },3 { waitTimeoutMs: 180_000 },4);
When the wait budget is exhausted, CreateosSandboxTimeoutError is thrown and
the sandbox (or template) may still be transitioning in the background — it is
not automatically destroyed. Call destroy() if you no longer need it:
TypeScript1const sandbox = await client.createSandbox(req, { waitTimeoutMs: 30_000 }).catch(2 async (err) => {3 if (err instanceof CreateosSandboxTimeoutError) {4 await client.getSandbox(err.sandboxId).then((s) => s.destroy()).catch(undefined);5 }6 throw err;7 },8);
Cancellation
Pass an AbortSignal to cancel a request (and any in-flight backoff sleep) at
any time:
TypeScript1const controller = new AbortController();2setTimeout(() => controller.abort(), 5_000);34try {5 const sandbox = await client.createSandbox(6 { shape: "s-4vcpu-4gb" },7 { signal: controller.signal },8 );9 try {10 // ...11 } finally {12 await sandbox.destroy();13 }14} catch (err) {15 if (err instanceof CreateosSandboxError) {16 // Includes the case where the signal fired mid-request17 }18}
Internally the SDK composes your signal with its own per-request timeout
signal using AbortSignal.any([userSignal, timeoutSignal]). Whichever fires
first wins:
- If the user-supplied signal fires first, the underlying
fetchrejects and the SDK re-throws the browser/runtimeAbortErroras-is (not wrapped) so your code can distinguish deliberate cancellation from other errors. - If the timeout signal fires first (and the user signal has not fired), the
SDK wraps the error as
CreateosSandboxTimeoutError.
Signals also cancel sleep during retry attempts. If you abort during the backoff window, the sleep resolves immediately and the pending retry is abandoned.
Polling backoff (pollUntil)
waitUntilRunning, waitUntilStopped, and the exported pollUntil helper
use a separate adaptive backoff tuned for lifecycle transitions, not for HTTP
retry:
- First 5 seconds of wall time: poll every 250 ms — fast sandbox startups that resolve in under a second should not be penalized by a long initial interval.
- After 5 seconds: the interval grows by ×1.25 per iteration, capped at 2 000 ms (2 s). A build that takes two minutes should not busyloop.
pollUntil is exported for custom poll loops that need the same behavior:
TypeScript1import { pollUntil } from "@nodeops-createos/sandbox";23const result = await pollUntil({4 poll: () => client.getSandbox(id),5 done: (v) => v.status === "running",6 failed: (v) =>7 v.status === "error" ? `Sandbox entered error state` : undefined,8 timeoutMs: 120_000,9 signal: controller.signal,10});
See the Helpers reference for the full
PollOptions interface.
How failures surface
After all retries are exhausted — or immediately for non-retryable conditions —
the SDK throws a typed error. Import and narrow with instanceof:
TypeScript1import {2 CreateosSandboxError,3 CreateosSandboxConnectionError,4 CreateosSandboxTimeoutError,5 CreateosSandboxRateLimitError,6 CreateosSandboxServerError,7} from "@nodeops-createos/sandbox";89try {10 const sandbox = await client.createSandbox({ shape: "s-4vcpu-4gb" });11 try {12 // ...13 } finally {14 await sandbox.destroy();15 }16} catch (err) {17 if (err instanceof CreateosSandboxConnectionError) {18 // Never reached the server. All retries exhausted.19 // err.cause holds the original network error.20 } else if (err instanceof CreateosSandboxTimeoutError) {21 // Per-request timeout or waitUntil* budget exceeded.22 } else if (err instanceof CreateosSandboxRateLimitError) {23 // 429 after retries. err.retryAfterSeconds has the last Retry-After header.24 } else if (err instanceof CreateosSandboxServerError) {25 // 5xx after all retries.26 } else if (err instanceof CreateosSandboxError) {27 // Anything else the SDK threw.28 }29}
| Error class | When thrown | Retried before throw? |
|---|---|---|
CreateosSandboxConnectionError | Network error (DNS, refused, reset) | Yes (idempotent only) |
CreateosSandboxTimeoutError | Per-request or wait-loop timeout | Yes (idempotent only) |
CreateosSandboxRateLimitError | 429 Too Many Requests | Yes |
CreateosSandboxServerError | 5xx (varies by method, see table above) | Yes (varies) |
CreateosSandboxAuthError | 401 Unauthorized | No |
CreateosSandboxNotFoundError | 404 | No |
CreateosSandboxValidationError | 400, 409, 422 | No |
CreateosSandboxPaymentRequiredError | 402 | No |
Full per-class field reference: Errors reference.
For recipes — retry-after handling, fallback logic, logging retries via hooks — see How-to: Error Handling.
Summary
| Dimension | Default | Override |
|---|---|---|
| Max retries | 2 (3 attempts total) | retry.maxRetries on client or per-call |
| Base backoff delay | 500 ms | retry.baseDelayMs |
| Backoff ceiling | 30 000 ms | retry.maxDelayMs |
| Per-request timeout | 60 000 ms | timeoutMs on client or per-call |
| Wait-loop timeout | 120 000 ms | waitTimeoutMs on createSandbox / waitUntil* |
| Streaming | Never retried | — |
Retry-After honored | Yes — overrides backoff formula | — |
| Abort support | AbortSignal composed with timeout | signal per-call |