How Rotating a JWT Secret Logged Out 34,000 Users and Exposed a Session Design Flaw
← Back
March 13, 2026Security10 min read

How Rotating a JWT Secret Logged Out 34,000 Users and Exposed a Session Design Flaw

Published March 13, 202610 min read

10:14 AM, Tuesday. I rotated our JWT signing secret as part of a scheduled security review. By 10:16 AM, support had 47 new tickets. By 10:30 AM, 34,000 users had been forcibly logged out at the same instant. By noon, we'd identified a session architecture flaw that had been quietly waiting to cause exactly this failure for three years.

Production failure

The rotation itself took seconds. Update the JWT_SECRET environment variable, restart the API pods, done. Every security checklist I've read says to rotate secrets regularly. What no checklist warned me about: when your entire authentication system uses a single symmetric HMAC secret for both signing and verification, rotating that secret instantly invalidates 100% of active tokens. There is no grace period. There is no overlap. Every logged-in user becomes unauthenticated the moment the new pods come up. Obvious in retrospect. Not obvious at 10:13 AM on the day of.

34,000 users logged out instantly
2 min from rotation to support flood
847 support tickets in 90 minutes
0 seconds of warning to users

False assumptions: "It's just a config change"

The team's mental model: rotate secret, old tokens invalid, users re-authenticate, business as usual. On paper, that's correct. In practice it ignored three things.

  • Scale of simultaneous impact. 34,000 active sessions at 10 AM on a Tuesday is not a slow trickle of re-logins; it's a synchronised stampede. Every user hitting any authenticated endpoint got a 401. Our rate limiter (tuned for attack traffic) tripped on the login surge and started blocking legitimate re-authentication attempts.
  • State lost in long sessions. Several enterprise users were mid-way through multi-step workflows like bulk uploads and report generation with no session persistence. Their work was gone.
  • Mobile apps don't re-auth gracefully. Our clients had no interceptor to handle 401s and redirect to login. They showed a blank screen or a cryptic error, not a login prompt.
"Why is everyone suddenly logged out? Did we get hacked?" — first support ticket, 10:16 AM. The irony: the rotation was preventing a potential hack. But users can't tell the difference between a security measure and a security breach if both look identical from the outside.

Investigation: finding the actual architecture flaw

The immediate fix was obvious. Rotate back to the old secret, restore sessions, plan the real rotation properly. While doing the post-mortem we found something worse than the incident itself. The JWT architecture had no revocation mechanism. At all.

Tokens were signed with HS256 (symmetric HMAC) using a single secret value. No token version field, no jti (JWT ID) claim, no token family tracking, no server-side session store. Once a token was issued, the only way to invalidate it was to rotate the secret, which was the nuclear option we'd just accidentally tested.

  ORIGINAL ARCHITECTURE (symmetric, single secret)
  ──────────────────────────────────────────────────────────────────

  Login                     API Request                  Rotation
  ──────                    ───────────                  ─────────
  User logs in              Client sends JWT             Secret changes
       │                         │                            │
       ▼                         ▼                            ▼
  Server signs JWT          Server verifies:            ALL tokens
  with JWT_SECRET           HMAC(header.payload,        immediately
       │                    JWT_SECRET) == sig?          invalid
       ▼                         │                            │
  Token issued               ✅ YES → serve            34,000 users
  (no expiry tracking)       ❌ NO  → 401              logged out
  (no revocation list)
  (no token version)


  TOKEN STRUCTURE (missing critical fields)
  ──────────────────────────────────────────────────────────────────

  {
    "sub": "user_1041",
    "iat": 1773190000,
    "exp": 1773276400,
    ← no "jti" (token ID)
    ← no "ver" (secret version)
    ← no "fam" (token family for rotation)
  }

The missing jti claim meant we couldn't revoke individual tokens. No way to track "this specific token has been invalidated." The missing version field meant we couldn't support multiple active secrets simultaneously during a rotation window.

Root Cause: Symmetric HMAC + No Rotation Strategy = A Ticking Clock

HS256 with a single secret isn't inherently wrong for small systems. The flaw was that we'd scaled to 34,000 active users without ever designing a rotation strategy. The secret had never been rotated in three years of production, which meant if it had ever been compromised, an attacker could have been forging tokens for up to three years. The rotation was the right call. The architecture that made it painful was the problem.

auth/jwt.ts — before and after
// BEFORE — single secret, no version, no rotation support
import jwt from 'jsonwebtoken';

export function signToken(userId: string) {
  return jwt.sign(
    { sub: userId },
    process.env.JWT_SECRET!,
    { expiresIn: '24h' }
  );
}

export function verifyToken(token: string) {
  // Only works with current secret — instant global logout on rotation
  return jwt.verify(token, process.env.JWT_SECRET!);
}


// AFTER — versioned secrets, overlap window, graceful rotation
const secrets: Record = {
  [process.env.JWT_SECRET_VERSION!]: process.env.JWT_SECRET!,
  // During rotation: keep previous version active for overlap window
  ...(process.env.JWT_SECRET_PREV_VERSION && process.env.JWT_SECRET_PREV
    ? { [process.env.JWT_SECRET_PREV_VERSION]: process.env.JWT_SECRET_PREV }
    : {}),
};

export function signToken(userId: string) {
  const version = process.env.JWT_SECRET_VERSION!;
  return jwt.sign(
    {
      sub: userId,
      ver: version,           // which secret version signed this
      jti: crypto.randomUUID(), // unique token ID for future revocation
    },
    secrets[version],
    { expiresIn: '24h' }
  );
}

export function verifyToken(token: string) {
  // Decode without verification first to read the version claim
  const decoded = jwt.decode(token) as { ver?: string } | null;
  const version = decoded?.ver ?? process.env.JWT_SECRET_VERSION!;
  const secret = secrets[version];

  if (!secret) {
    throw new Error('Token signed with unknown secret version');
  }

  // Verify with the correct versioned secret
  return jwt.verify(token, secret);
}

Architecture fix: versioned secrets + overlap window + client-side 401 handling

We made four changes. We picked this over switching to RS256 (asymmetric) because it required zero infrastructure changes. No key management service, no JWKS endpoint, no client-side public key distribution. The versioned HMAC approach gave us graceful rotation with one environment variable change and a 24-hour overlap window. In a bigger org I'd probably push for RS256, but for where we were it was the pragmatic move.

  NEW ROTATION PROCESS (versioned secrets, 24h overlap)
  ──────────────────────────────────────────────────────────────────

  Day 0: Normal operation
  ─────────────────────────────────────────────
  JWT_SECRET_VERSION = "v3"
  JWT_SECRET         = "secret-v3"
  (no PREV vars set)

  All tokens signed with v3. Verified with v3. ✓


  Day 1: Rotation starts
  ─────────────────────────────────────────────
  JWT_SECRET_VERSION      = "v4"       ← new version
  JWT_SECRET              = "secret-v4" ← new secret
  JWT_SECRET_PREV_VERSION = "v3"       ← keep old version
  JWT_SECRET_PREV         = "secret-v3" ← keep old secret

  New tokens: signed with v4
  Old tokens (ver=v3): still verified with v3 ✓
  Users stay logged in through the overlap ✓


  Day 2: Rotation complete (remove PREV vars)
  ─────────────────────────────────────────────
  JWT_SECRET_VERSION = "v4"
  JWT_SECRET         = "secret-v4"
  (PREV vars removed — v3 tokens now expired naturally)

  Zero forced logouts. Zero support tickets. ✓

The other three changes.

  • Mobile 401 interceptor. Both iOS and Android clients now have an Axios/URLSession interceptor that catches 401 responses, clears the stored token, and navigates to the login screen with a human-readable message rather than a blank screen.
  • Rate limiter carve-out for /auth/login. The login endpoint is exempt from the general rate limiter and has its own dedicated limit with exponential back-off per IP, so legitimate re-auth surges don't get blocked alongside actual brute-force attempts.
  • Rotation runbook. A documented procedure in the team wiki: what to set, what to monitor, how long to keep the overlap window, and how to verify the rotation completed cleanly before removing the previous secret.

Lessons learned

  • A secret you've never rotated is a secret you can't safely rotate. If your first rotation is also your first time finding out whether the rotation is safe, you'll have an incident. Rotation drills belong in staging, not as a surprise in production.
  • Symmetric JWT with a single secret has no graceful degradation. It works fine until you need to rotate, revoke, or audit. Adding a version claim costs nothing and buys everything.
  • Rate limiters need to know about planned surges. A login surge caused by a forced logout looks identical to a credential-stuffing attack. Plan for both patterns differently.
  • Mobile clients need a 401 strategy, not just a 401 response. A bare 401 is a complete session failure on mobile. The client has to know how to recover.
  • Security work can feel like an outage to users. Communicate. Even a brief status page update saying "we rotated security credentials, please log in again" would have halved the support tickets.
Share this
← All Posts10 min read