Vol. III · Issue 06 · May 2026Free worldwide PDF + EPUB delivery · No DRMISSN 2814-9921

PDF·EPUB·Lifetime updates

DevOps & SRE · 2nd Edition · March 2026

The Kubernetes On-Call Handbook

Production playbooks for engineers carrying the pager

5.0(187 ratings)

advanced

412 pages

Most Kubernetes books optimize for tutorials. This one optimizes for the 3am page. You will learn how to triage a wedged control plane, decode opaque CNI failures, untangle ETCD performance regressions, and write incident-grade runbooks that other on-call engineers will actually use. Every chapter ends with a real anonymized postmortem from production environments running between 200 and 50,000 nodes.

Author

Dr. Mariana Okafor

Principal SRE, Distributed Systems

Mariana spent the last decade running on-call for trading systems and global edge networks. Her work focuses on the boring failures nobody writes blog posts about: clock skew, partial partitions, and the contract between SREs and the language runtime.

$22.99

$29.99

Instant PDF + EPUB delivery

DRM-free, copy onto any device

Free chapter updates for the life of the edition

View cart

Specifications

Pages: 412
Edition: 2nd Edition
Language: English
Level: advanced
ISBN: 978-1-99999-001-2
Published: March 2026

Editorial review

Reviewed by three working engineers at peer publications before publication. We do not publish first drafts.

Table of contents

What you'll find inside.

01Why Kubernetes Pages You
02The Anatomy of an etcd Failure
03CNI Forensics: Calico, Cilium, Flannel
04Control Plane Recovery Drills
05Cluster Autoscaler Pathologies
06Stateful Workloads You Cannot Drain
07Observability for the Cluster, Not the Pod
08Writing Runbooks Engineers Trust
09Postmortems Without the Blame
10Building an On-Call Culture That Sustains

Reader reviews

5.0 / 5

187 verified readers

Verified purchase

Made my team's on-call rotation calmer

Distributed the postmortem chapter to my whole team. It changed the temperature of our incident reviews within two weeks.

Diego Martínez · Engineering Manager, Platform

Verified purchase

Worth the price on chapter 3 alone

I have read every K8s book published in the last six years. This one is the first that admits ETCD will eventually be your problem and tells you exactly what to do about it. Chapter 3 paid for the book inside a week.

Hannah Lieberman · Staff SRE at fintech unicorn

Also in this section

The Kubernetes On-Call Handbook

What you'll find inside.

5.0 / 5

Made my team's on-call rotation calmer

Worth the price on chapter 3 alone

More from DevOps & SRE

Observability Without Vendor Lock-In