Reliability: retries, timeouts, and cancellation

The SDK is built to survive transient failures — dropped connections, rate limits, momentary server hiccups — without you writing retry loops. At the same time it never silently re-sends a request the server may have already processed, because doing so could double-create a sandbox or double-charge bandwidth.

This document explains how the transport makes those guarantees, and how you tune or override them.

At a glance

Package: @nodeops-createos/sandbox (npm)
Import: import { createClient } from "@nodeops-createos/sandbox"
Base URL: https://api.sb.createos.sh — override with CREATEOS_SANDBOX_BASE_URL
Auth: API key via the apiKey option or CREATEOS_SANDBOX_API_KEY

Idempotent vs. non-idempotent requests

The key question when deciding whether to retry is: if the server already processed this request, will retrying cause harm?

Idempotent methods — GET, HEAD, PUT, DELETE — answer no. A repeated GET returns the same data; a repeated DELETE on a resource already deleted is still a no-op (or 404, which is handled). The SDK therefore retries these methods freely on both network failures and the set of server statuses that unambiguously signal transient trouble:

Trigger	Idempotent (`GET HEAD PUT DELETE`)	Non-idempotent (`POST PATCH`)
Network error (DNS, connection refused, socket reset)	Retried	Not retried
`408 Request Timeout`	Retried	Not retried
`500 Internal Server Error`	Retried	Not retried
`502 Bad Gateway`	Retried	Not retried
`503 Service Unavailable`	Retried	Retried
`429 Too Many Requests`	Retried	Retried
`504 Gateway Timeout`	Retried	Not retried

Non-idempotent methods — POST, PATCH — are dangerous to retry unless the server provably did not act on the request. Two statuses carry that guarantee: 429 (rate-limit; server rejected the request before processing it) and 503 (service unavailable; the upstream never reached a handler). Every other failure on a POST or PATCH is yours to decide what to do with.

Requests with a streaming (ReadableStream) body are also never retried, because the stream is consumed on the first attempt and cannot be replayed.

Backoff and jitter

Between retry attempts the SDK sleeps a computed delay. The formula (from src/http.ts, backoffDelay):

delay = min(
  baseDelayMs × 2^attempt + Math.random() × baseDelayMs,
  maxDelayMs
)

With defaults (baseDelayMs = 500 ms, maxDelayMs = 30 000 ms):

Attempt	Deterministic term	Jitter range	Approximate ceiling
0 (first retry)	500 ms	0–500 ms	1 000 ms
1 (second retry)	1 000 ms	0–500 ms	1 500 ms

The cap (maxDelayMs = 30 s) applies to the sum of the deterministic term and the random jitter, so the result never exceeds 30 s regardless of how many attempts are made.

Why jitter? When many clients hit the same transient error at the same time (a server restart, a brief overload), pure exponential backoff would cause them all to retry in synchronized waves — the thundering-herd problem. Adding a per-attempt random offset spreads retries across a window, smoothing the load spike.

Defaults and overrides

Parameter	Default	Override scope
`maxRetries`	`2` (3 total attempts)	Client or per-request
`baseDelayMs`	`500 ms`	Client or per-request
`maxDelayMs`	`30 000 ms`	Client or per-request

Set a client-wide policy in the constructor:

TypeScript
1const client = new CreateosSandboxClient({
2  retry: { maxRetries: 4, baseDelayMs: 250 },
3});

Override for a single call, or disable retries entirely for one call:

TypeScript
1// Disable retries for one call (fast feedback on failures)
2await client.whoami({ retry: false });
3
4// Override max retries for one call
5await client.listSandboxes({ retry: { maxRetries: 1 } });

Per-request options are merged over the client default, not replaced — a per-request { maxRetries: 1 } still inherits baseDelayMs and maxDelayMs from the client policy.

Retry-After header

When the server returns 429 or 503 with a Retry-After header, the SDK uses that value instead of its own computed backoff:

delay = Retry-After value in seconds × 1000 ms

Both delta-seconds (Retry-After: 5) and HTTP-date formats are parsed. The server is telling you exactly when it will accept the next request; overriding with a shorter client-side backoff would just produce another 429.

After the Retry-After delay has elapsed, the SDK retries the request normally. If the server repeats the 429, the Retry-After delay is honored again, until maxRetries is exhausted.

When the SDK surfaces a CreateosSandboxRateLimitError after retries are exhausted, err.retryAfterSeconds carries the last parsed Retry-After value so your code can make its own scheduling decision.

Streaming is never retried

The stream method (src/http.ts) issues a single dispatch and yields frames from the response body. There is no retry loop around it.

The reason is fundamental: by the time a streaming error surfaces, the iterator may have already yielded dozens of frames. There is no safe replay position. Restarting from the beginning would duplicate output; seeking to an offset is not possible without server-side support, and the control plane does not provide it.

If the underlying connection breaks, the async iterator throws and the for await loop unwinds. Wrap the loop in application-level logic if you need restart behavior.

Timeouts

Per-request timeout

Every request carries a timeout. The default is 60 000 ms (60 s), set as DEFAULT_TIMEOUT_MS in src/config.ts. Override it at the client level or per call:

TypeScript
1// Client-wide — all requests time out after 10 s unless overridden
2const client = new CreateosSandboxClient({ timeoutMs: 10_000 });
3
4// Per-call — only this request gets 120 s
5await client.createSandbox(req, { timeoutMs: 120_000 });
6
7// Disable timeout for one call — use with care
8await client.someMethod({ timeoutMs: 0 });

A timeoutMs: 0 disables the per-request timeout entirely. The caller is then responsible for bounding the request's duration (e.g. via an AbortSignal).

The timeout is applied per dispatch attempt, not across the entire retry sequence. Each attempt gets a fresh 60 s budget. A slow server that returns 503 on attempt 0 after 59 s, waits for backoff, then times out on attempt 1 after another 60 s has consumed roughly 2 minutes total.

When the timeout elapses, the SDK throws CreateosSandboxTimeoutError with a message like Request timed out 60000ms: GET /v1/sandboxes.

Wait timeout

createSandbox and the waitUntil* lifecycle helpers (waitUntilRunning, waitUntilStopped) run a poll loop on a separate budget — 120 000 ms (120 s) by default (DEFAULT_WAIT_MS). Pass waitTimeoutMs to change it:

TypeScript
1const sandbox = await client.createSandbox(
2  { shape: "s-4vcpu-4gb" },
3  { waitTimeoutMs: 180_000 },
4);

When the wait budget is exhausted, CreateosSandboxTimeoutError is thrown and the sandbox (or template) may still be transitioning in the background — it is not automatically destroyed. Call destroy() if you no longer need it:

TypeScript
1const sandbox = await client.createSandbox(req, { waitTimeoutMs: 30_000 }).catch(
2  async (err) => {
3    if (err instanceof CreateosSandboxTimeoutError) {
4      await client.getSandbox(err.sandboxId).then((s) => s.destroy()).catch(undefined);
5    }
6    throw err;
7  },
8);

Cancellation

Pass an AbortSignal to cancel a request (and any in-flight backoff sleep) at any time:

TypeScript
1const controller = new AbortController();
2setTimeout(() => controller.abort(), 5_000);
3
4try {
5  const sandbox = await client.createSandbox(
6    { shape: "s-4vcpu-4gb" },
7    { signal: controller.signal },
8  );
9  try {
10    // ...
11  } finally {
12    await sandbox.destroy();
13  }
14} catch (err) {
15  if (err instanceof CreateosSandboxError) {
16    // Includes the case where the signal fired mid-request
17  }
18}

Internally the SDK composes your signal with its own per-request timeout signal using AbortSignal.any([userSignal, timeoutSignal]). Whichever fires first wins:

If the user-supplied signal fires first, the underlying fetch rejects and the SDK re-throws the browser/runtime AbortError as-is (not wrapped) so your code can distinguish deliberate cancellation from other errors.
If the timeout signal fires first (and the user signal has not fired), the SDK wraps the error as CreateosSandboxTimeoutError.

Signals also cancel sleep during retry attempts. If you abort during the backoff window, the sleep resolves immediately and the pending retry is abandoned.

Polling backoff (`pollUntil`)

waitUntilRunning, waitUntilStopped, and the exported pollUntil helper use a separate adaptive backoff tuned for lifecycle transitions, not for HTTP retry:

First 5 seconds of wall time: poll every 250 ms — fast sandbox startups that resolve in under a second should not be penalized by a long initial interval.
After 5 seconds: the interval grows by ×1.25 per iteration, capped at 2 000 ms (2 s). A build that takes two minutes should not busyloop.

pollUntil is exported for custom poll loops that need the same behavior:

TypeScript
1import { pollUntil } from "@nodeops-createos/sandbox";
2
3const result = await pollUntil({
4  poll: () => client.getSandbox(id),
5  done: (v) => v.status === "running",
6  failed: (v) =>
7    v.status === "error" ? `Sandbox entered error state` : undefined,
8  timeoutMs: 120_000,
9  signal: controller.signal,
10});

See the Helpers reference for the full PollOptions interface.

How failures surface

After all retries are exhausted — or immediately for non-retryable conditions — the SDK throws a typed error. Import and narrow with instanceof:

TypeScript
1import {
2  CreateosSandboxError,
3  CreateosSandboxConnectionError,
4  CreateosSandboxTimeoutError,
5  CreateosSandboxRateLimitError,
6  CreateosSandboxServerError,
7} from "@nodeops-createos/sandbox";
8
9try {
10  const sandbox = await client.createSandbox({ shape: "s-4vcpu-4gb" });
11  try {
12    // ...
13  } finally {
14    await sandbox.destroy();
15  }
16} catch (err) {
17  if (err instanceof CreateosSandboxConnectionError) {
18    // Never reached the server. All retries exhausted.
19    // err.cause holds the original network error.
20  } else if (err instanceof CreateosSandboxTimeoutError) {
21    // Per-request timeout or waitUntil* budget exceeded.
22  } else if (err instanceof CreateosSandboxRateLimitError) {
23    // 429 after retries. err.retryAfterSeconds has the last Retry-After header.
24  } else if (err instanceof CreateosSandboxServerError) {
25    // 5xx after all retries.
26  } else if (err instanceof CreateosSandboxError) {
27    // Anything else the SDK threw.
28  }
29}

Error class	When thrown	Retried before throw?
`CreateosSandboxConnectionError`	Network error (DNS, refused, reset)	Yes (idempotent only)
`CreateosSandboxTimeoutError`	Per-request or wait-loop timeout	Yes (idempotent only)
`CreateosSandboxRateLimitError`	`429 Too Many Requests`	Yes
`CreateosSandboxServerError`	`5xx` (varies by method, see table above)	Yes (varies)
`CreateosSandboxAuthError`	`401 Unauthorized`	No
`CreateosSandboxNotFoundError`	`404`	No
`CreateosSandboxValidationError`	`400`, `409`, `422`	No
`CreateosSandboxPaymentRequiredError`	`402`	No

Full per-class field reference: Errors reference.

For recipes — retry-after handling, fallback logic, logging retries via hooks — see How-to: Error Handling.

Summary

Dimension	Default	Override
Max retries	`2` (3 attempts total)	`retry.maxRetries` on client or per-call
Base backoff delay	`500 ms`	`retry.baseDelayMs`
Backoff ceiling	`30 000 ms`	`retry.maxDelayMs`
Per-request timeout	`60 000 ms`	`timeoutMs` on client or per-call
Wait-loop timeout	`120 000 ms`	`waitTimeoutMs` on `createSandbox` / `waitUntil*`
Streaming	Never retried	—
`Retry-After` honored	Yes — overrides backoff formula	—
Abort support	`AbortSignal` composed with timeout	`signal` per-call

100,000+ Builders. One Workspace.