Vol. III · Issue 06 · May 2026ISSN 2814-9921
The Kubernetes On-Call Handbook
PDF·EPUB·Lifetime updates
DevOps & SRE · 2nd Edition · March 2026

The Kubernetes On-Call Handbook

Production playbooks for engineers carrying the pager

5.0(187 ratings)
advanced
412 pages

Most Kubernetes books optimize for tutorials. This one optimizes for the 3am page. You will learn how to triage a wedged control plane, decode opaque CNI failures, untangle ETCD performance regressions, and write incident-grade runbooks that other on-call engineers will actually use. Every chapter ends with a real anonymized postmortem from production environments running between 200 and 50,000 nodes.

Dr. Mariana Okafor
Author
Dr. Mariana Okafor
Principal SRE, Distributed Systems

Mariana spent the last decade running on-call for trading systems and global edge networks. Her work focuses on the boring failures nobody writes blog posts about: clock skew, partial partitions, and the contract between SREs and the language runtime.

$22.99
$29.99
Instant PDF + EPUB delivery
DRM-free, copy onto any device
Free chapter updates for the life of the edition
View cart
Specifications
Pages
412
Edition
2nd Edition
Language
English
Level
advanced
ISBN
978-1-99999-001-2
Published
March 2026
Editorial review

Reviewed by three working engineers at peer publications before publication. We do not publish first drafts.

Table of contents

What you'll find inside.

  1. 01Why Kubernetes Pages You
  2. 02The Anatomy of an etcd Failure
  3. 03CNI Forensics: Calico, Cilium, Flannel
  4. 04Control Plane Recovery Drills
  5. 05Cluster Autoscaler Pathologies
  6. 06Stateful Workloads You Cannot Drain
  7. 07Observability for the Cluster, Not the Pod
  8. 08Writing Runbooks Engineers Trust
  9. 09Postmortems Without the Blame
  10. 10Building an On-Call Culture That Sustains
Reader reviews

5.0 / 5

187 verified readers

Verified purchase

Made my team's on-call rotation calmer

Distributed the postmortem chapter to my whole team. It changed the temperature of our incident reviews within two weeks.

Diego Martínez · Engineering Manager, Platform
Verified purchase

Worth the price on chapter 3 alone

I have read every K8s book published in the last six years. This one is the first that admits ETCD will eventually be your problem and tells you exactly what to do about it. Chapter 3 paid for the book inside a week.

Hannah Lieberman · Staff SRE at fintech unicorn
Also in this section

More from DevOps & SRE