Idempotency keys: the hard parts nobody warns you about

Most teams add idempotency keys the same way: a request header, a UUID, a table that maps key to response. They ship it, claim victory, and move on. Then, three months later, a customer reports being charged twice for an order whose response they never saw, and the on-call engineer discovers that the implementation is correct only for the easy case — the case where nothing actually goes wrong.

Here is a tour of the parts the textbooks gloss over.

The easy case is genuinely easy

The naive flow works fine when the request succeeds and the response makes it back:

python

# server-side, simplified
def handle_request(req):
    key = req.headers["Idempotency-Key"]
    cached = store.get(key)
    if cached is not None:
        return cached         # safe: client is just retrying
    response = do_work(req)
    store.set(key, response)
    return response

If the client retries with the same key, the second call returns the cached response. Nothing happens twice. Total time to implement: about thirty minutes. This is what every "idempotency key" blog post stops at.

The hard case: what if `do_work` is in flight?

Suppose the client sends a request, the server begins processing, and then the client's TCP connection drops before the response is returned. The client retries. The server sees the same key — but the original request is still running. What do you do?

Three options, each with a sharp edge:

Wait for the in-flight request to finish, then return its result. Simple, but if the original request is hung on a slow downstream, the retry now blocks on the same slow downstream. If the client retries again (because the retry itself timed out), you now have three threads all waiting on each other. This is how you DoS yourself.
Reject the retry with a "409 Conflict" or "425 Too Early." The client must wait and retry later. This is what Stripe does. It is correct, but it requires the client to actually handle the response code, which most clients do not.
Cancel and re-run the original work. This is the most dangerous option. If the original work is partially done — say, half-written to a downstream — cancellation rarely undoes it.

The version I have settled on is option (2), with a tight bound on how long the lock is held:

python

def handle_request(req):
    key = req.headers["Idempotency-Key"]

    # First, check for a completed cached result
    cached = store.get_completed(key)
    if cached is not None:
        return cached

    # Try to acquire a short-lived in-flight lock
    if not store.try_lock(key, ttl_seconds=30):
        # Someone else is processing this key right now
        return ("conflict, try again", 409)

    try:
        response = do_work(req)
        store.set_completed(key, response, ttl_seconds=86_400)
        return response
    finally:
        store.release_lock(key)

The 30-second TTL on the lock matters. If the server crashes mid-request, the lock evaporates on its own — no zombie locks blocking future retries forever.

The harder case: partial side effects

What if do_work actually does two things — say, writes a row to your database and then calls a payments processor — and crashes between them?

If you cache the request before doing the work, a retry returns a stale cached response that does not reflect the partial state.

If you cache it after, a retry re-executes the work, and the payments processor sees two charge requests.

The standard fix is to give the side effect itself an idempotency key, and forward yours to it. If the payments processor supports keys (Stripe, Adyen, Braintree all do), you forward Idempotency-Key: <your-key> and let it handle the duplicate-detection on its side. Your job becomes recording what you did, not what the downstream did.

python

def do_work(req):
    key = req.headers["Idempotency-Key"]

    # Step 1 — write our own row
    order_id = db.insert_order(req, idempotency_key=key)

    # Step 2 — forward the same key to Stripe
    charge = stripe.Charge.create(
        amount=req.amount,
        currency=req.currency,
        source=req.source,
        idempotency_key=key,        # ← key thing
    )

    db.attach_charge(order_id, charge.id)
    return {"order_id": order_id, "charge_id": charge.id}

If this function is killed between db.insert_order and stripe.Charge.create, a retry will see the existing order row (because insert_order is itself keyed on key), and Stripe will see the duplicate idempotency_key and return the existing charge. Net effect: one order, one charge, even though the work ran twice.

The hardest case: schema changes

The unsexy one. You change the response shape of an endpoint — say, you add a field. You deploy. A client retries a request from before the deploy, hits a cached response with the old shape, and chokes on the missing field.

Treat the cache as a versioned object. Either bake the schema version into the key, or store it alongside the response and reject cache hits when the version no longer matches:

python

@dataclass
class CachedResponse:
    schema_version: int
    body: bytes
    status: int

CURRENT_SCHEMA = 4

def handle_request(req):
    key = req.headers["Idempotency-Key"]
    cached = store.get(key)
    if cached is not None and cached.schema_version == CURRENT_SCHEMA:
        return cached
    # Either no cache, or cached at an older schema — re-run
    ...

A rejected cache hit means the work runs again, which means the downstream do_work must still be idempotent at the side-effect level. The two layers reinforce each other.

A checklist I now use

Before I sign off on an idempotency implementation, I ask:

Does the key have an enforced minimum length (32+ chars) and reject obviously-bad values?
Is there a TTL on cached responses, and is it documented?
Is there a separate, shorter TTL on the in-flight lock?
Do retries that arrive while the original is in flight get a 409, not a wait?
Are downstream side effects keyed on the same key, or do they have their own idempotency story?
Is the response shape versioned, and do mismatched-version cache hits re-run?
Is the request body hashed and compared against the cached request, so that a key reused with different inputs gets a 422?

That last one catches more bugs than the rest combined. If a client accidentally reuses an idempotency key with a different payload, you do not want to silently return the old response — you want to scream.

python

# When caching, also store the request hash
store.set_completed(
    key,
    response,
    request_hash=sha256(canonical_json(req.body)),
    ttl_seconds=86_400,
)

# When reading, verify
cached = store.get_completed(key)
if cached.request_hash != sha256(canonical_json(req.body)):
    return ("idempotency key reused with different payload", 422)

The takeaway

Idempotency keys feel like a small, almost-trivial feature. The naive implementation is correct for the happy path, which is most of your traffic, which is why you can ship the naive version and not notice the problem for months. But the failure mode — duplicate charges, duplicate orders, irreversible side effects — is exactly the failure mode you most need to prevent.

The fix is not subtle, but it does have moving parts. Treat the key as the index of a small state machine: not-seen, in-flight, completed, completed-but-stale. Make every transition explicit and short. Forward keys to downstreams that support them.

Stripe's idempotent requests doc is the best ~600-word writeup I have read on this topic. Read it twice.

Idempotency keys: the hard parts nobody warns you about

The easy case is genuinely easy ​

The hard case: what if do_work is in flight? ​

The harder case: partial side effects ​

The hardest case: schema changes ​

A checklist I now use ​

The takeaway ​

Read next

The easy case is genuinely easy

The hard case: what if `do_work` is in flight?

The harder case: partial side effects

The hardest case: schema changes

A checklist I now use

The takeaway