Week 1 — Foundations

What is DevOps?

Understanding the culture, philosophy, and measurement framework that defines modern software delivery. We'll cover CALMS, DORA metrics, and the DevOps infinity loop.

⏱ Duration 60 min

📖 Theory 30 min

🔧 Lab 25 min

❓ Quiz 5 min

Session Overview

What we'll cover today

01

The Problem — Dev vs Ops

Why the wall of confusion exists and why it matters for business outcomes.

02

DevOps — Definition & Origin

Who coined it, what it means, and what it is NOT.

03

The CALMS Framework

Culture, Automation, Lean, Measurement, Sharing — the 5 pillars.

04

DORA Metrics

How to measure DevOps performance with the 4 key engineering metrics.

05

The DevOps Infinity Loop

The 8-stage lifecycle: Plan → Code → Build → Test → Release → Deploy → Operate → Monitor.

06

Before vs After DevOps

Real-world comparison of traditional vs DevOps-enabled delivery.

07

🔧 Hands-On Lab

Map your current SDLC, identify bottlenecks, and set your first automation target.

08

❓ Knowledge Check

3 questions to validate understanding of today's concepts.

Part 1 of 6

The problem DevOps solves

⚠ Dev Team

Ship code as fast as possible

Frequent deployments = progress

Change is good, change is innovation

New features first priority

"It works on my machine"

⚡

🛡 Ops Team

Keep systems stable at all costs

Fewer changes = fewer incidents

Change is risk

Reliability first priority

"Deploy on Friday? Absolutely not."

The Result

This conflict creates the "Wall of Confusion" — code thrown over the wall by Dev is rejected, delayed, or broken in production by Ops, leading to finger-pointing and slow delivery.

Business Impact

Releases take weeks or months
High change failure rate (>30%)
Long recovery times when things break
Engineers spend time on manual toil, not innovation
Competitors ship faster and capture market share

Part 1 of 6 — continued

DevOps — origin & definition

09

2009 — Ghent, Belgium

Patrick Debois organizes the first DevOpsDays conference, coining the term "DevOps" by merging "Development" and "Operations."

10

2010 — The Agile Sysadmin

John Willis & Damon Edwards define DevOps through CAMS (precursor to CALMS), connecting it to Lean and Agile principles.

14

2013–2016 — The Phoenix Project

Gene Kim's novel "The Phoenix Project" popularizes DevOps concepts globally, drawing parallels to Lean manufacturing.

19

2019 — DORA Research

Google's DORA team publishes the definitive 4-metric framework for measuring DevOps performance, backed by 6 years of data.

Definition

DevOps is a set of practices, tools, and cultural philosophies that automate and integrate the processes between software development and IT operations teams — enabling organizations to deliver applications and services at high velocity.

What DevOps is NOT

❌ Not just a job title ("DevOps Engineer")
❌ Not a tool or technology
❌ Not only for large companies
❌ Not the same as Agile (though they complement)
❌ Not something you "implement" and then you're done

What DevOps IS

✅ A continuous cultural shift toward shared responsibility, collaboration, and fast, safe delivery. It's about breaking silos, not about hiring a "DevOps person."

Part 2 of 6

The CALMS Framework

C

Culture

Collaboration, trust, shared ownership, and blameless learning

A

Automation

Automate repetitive tasks: testing, deployment, provisioning

L

Lean

Eliminate waste, small batches, fast feedback loops

M

Measurement

Track DORA metrics, deployment frequency, MTTR, lead time

S

Sharing

Share tools, knowledge, postmortems, and runbooks across teams

Key Insight

CALMS is not a checklist — it's a continuous journey. You don't "complete" culture or "finish" automation. Teams at different maturity levels will score differently across each dimension.

CALMS — C

Culture is the foundation

Shared Ownership

Dev and Ops jointly own reliability. If the app goes down at 3 AM, both teams care — not just Ops. This is embodied in the phrase: "You build it, you run it."

Blameless Postmortems

When incidents happen, the question is "What failed in our systems/process?" — not "Whose fault was it?" Psychological safety enables engineers to escalate issues early without fear.

Empathy Across Teams

Dev should understand Ops pain (3 AM pages). Ops should understand Dev pressure (sprint deadlines). Embedding Ops requirements into sprint planning bridges this gap.

The Toyota Andon Cord

In Toyota's Lean manufacturing, any worker can stop the entire production line if they spot a defect. This is the cultural equivalent of a developer being empowered to halt a deployment. Culture makes or breaks DevOps — tools are secondary.

Signs of Poor Culture

Separate "Dev tickets" and "Ops tickets"
Post-incident blame and escalations
Ops requires change approval boards for every deploy
Developers don't have access to production logs
"That's not my job" mentality

CALMS — A

Automation — eliminate toil

What to Automate

Testing — every commit triggers unit, integration tests
Code quality — linting, static analysis on every PR
Building — reproducible builds via CI pipelines
Provisioning — infrastructure via Terraform, Ansible
Deployment — helm upgrade, kubectl apply, ArgoCD sync
Security scanning — SAST, dependency CVE scans

The Automation Dividend

If a task takes 30 min manually and runs 2× per week, automating it in 4 hours pays off in 4 weeks — and pays dividends forever. Automate anything you do more than once a week.

yaml — GitHub Actions (CI)

on: [push]
jobs:
  ci:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npm run lint      # Automate quality check
      - run: npm test          # Automate testing
      - run: npm run build     # Automate build
      - run: docker build .    # Automate packaging

Rule of Thumb

"If you do it manually more than twice, automate it." This includes deployments, database backups, certificate renewals, and incident response runbooks.

CALMS — L · M · S

Lean, Measurement & Sharing

L — Lean

Borrowed from Toyota Production System. Key principles:

Small batch sizes — deploy small, deploy often. Fewer things to break.
Limit WIP — too many concurrent features = context-switching = slow delivery
Value stream mapping — identify every step from code to production and eliminate waste
Fast feedback — the faster you know something broke, the cheaper it is to fix

M — Measurement

You can't improve what you don't measure. Essential metrics:

Deploy frequency — how often you ship
Lead time — commit to production time
Change failure rate — % of deploys causing incidents
MTTR — time to recover from failure
Build times — CI pipeline duration trends

S — Sharing

Knowledge hoarded is knowledge lost. Share across the organization:

Postmortems — publish incident learnings widely
Runbooks — document operational procedures in Git
Internal tech talks — team sharing of DevOps wins
Open source — contribute tools back to community
On-call rotation — spread operational knowledge

Lean Insight

Amazon deploys every 11.7 seconds. Netflix deploys hundreds of times per day. This is only possible with Lean (small batches) + Automation (CI/CD) + Measurement (immediate feedback on every deploy).

Part 3 of 6

DORA Metrics — measuring DevOps success

🚀

Deployment Frequency

How often does your organization successfully release to production?

Elite: Multiple times per day

⏱

Lead Time for Changes

How long does it take for a commit to reach production?

Elite: Less than 1 hour

🔥

Change Failure Rate

What percentage of production changes result in a degraded service or need remediation?

Elite: 0–15%

🔄

Mean Time to Recover (MTTR)

How long does it take to restore service after an incident or failure?

Elite: Less than 1 hour

Research Source

DORA metrics come from 6+ years of research by Google's DevOps Research and Assessment team covering 32,000+ professionals globally. They are the most statistically validated indicators of software delivery performance.

Part 3 of 6 — continued

DORA performance tiers

Metric	Elite	High	Medium	Low
Deploy Frequency	Multiple/day	1× day–week	1× week–month	1× / 6 months
Lead Time	< 1 hour	1 day–week	1 week–month	> 6 months
Change Fail Rate	0–15%	16–30%	16–30%	16–30%
MTTR	< 1 hour	< 1 day	1 day–week	> 6 months

Elite Performer (e.g. Netflix, Amazon)

127× more frequent deployments than low performers, with 2,555× faster lead times and 2,604× faster recovery. Source: DORA 2023 State of DevOps Report.

Why This Matters

Higher delivery performance correlates directly with better business outcomes: higher revenue, market share, customer satisfaction, and employee retention. DevOps is a business strategy, not just a tech choice.

Part 4 of 6

The DevOps Infinity Loop

Dev Phases

Plan — Jira, GitHub Issues, Azure Boards
Code — Git, VS Code, feature branches
Build — npm/Maven, Docker image build
Test — Jest, JUnit, Selenium, Postman

Ops Phases

Release — GitHub Actions, Azure Pipelines
Deploy — Kubernetes, Helm, ArgoCD
Operate — Terraform, Ansible, runbooks
Monitor — Prometheus, Grafana, New Relic

The Loop Never Stops

Monitor feeds back to Plan — user behaviour and production metrics directly influence the next sprint's priorities. This is the continuous improvement engine.

Part 5 of 6

Before vs After DevOps

⚠ Traditional (Pre-DevOps)

Releases every 3–6 months
Code "thrown over the wall" to Ops
Manual deployments with 20-step runbooks
Staging ≠ Production (snowflake environments)
Bug found in production 3 weeks after commit
Change Approval Board required for every deploy
On-call engineer has no visibility into app code
Friday deploys → weekend on-call nightmare
Rollback = "call the DBA at midnight"
New hire takes 6 weeks to get productive

✓ DevOps-Enabled

Releases multiple times per day
Dev and Ops work in the same sprint
Automated CI/CD — git push → deployed
Infrastructure as Code — identical environments
Bug caught in CI within 5 minutes of commit
Self-service deploys — no approval board
Shared dashboards — everyone has visibility
Small deploys = boring deploys (safe any day)
Rollback = kubectl rollout undo
New hire productive in 1 day (IaC + runbooks)

Part 6 of 6

🔧 Hands-On Lab

Map Your Current SDLC — identify bottlenecks and measure your baseline DORA metrics

⏱ 25 minutes

🎯 No tools required

📝 Paper or Miro/FigJam

🔧 Lab — Step by Step

Map your current state

1

Draw your current pipeline (8 min)

On paper, a whiteboard, or Miro — sketch how code moves from your laptop to production. Include every manual step, approval, handoff, and wait time. Don't optimize yet — just map reality.

2

Mark every "handoff" (3 min)

Circle every point where work moves from one team/person/system to another — Dev → QA → Ops → CAB → Production. These are your biggest bottlenecks. Each handoff adds days or weeks of wait time.

3

Measure your current lead time (5 min)

Pick a recent feature. Find the commit timestamp and the production deploy timestamp. Calculate the gap — this is your current Lead Time for Changes. Write it down. It might shock you.

4

Fill in the DORA baseline (5 min)

Use the template below. Be honest — these numbers are your starting point, not a judgment. The goal is to improve them over the next 35 days.

5

Identify one automation target (4 min)

Pick ONE manual step that you could automate. Be specific: "We manually run npm test before every PR" or "We manually kubectl apply every deployment." This is your first DevOps improvement target.

🔧 Lab — Template

Your DORA baseline

text — Fill this in

DORA Baseline — [Your Name] — [Date]
========================================

Current Lead Time:       _____ days
Deploy Frequency:        _____ per month
Change Failure Rate:     _____%
MTTR (last 3 incidents): _____ hours

Pipeline Steps (current):
  1. Developer writes code      → _____ min
  2. Code review (PR)           → _____ hours
  3. QA/manual testing          → _____ hours
  4. Staging deployment         → _____ hours
  5. Change Approval Board      → _____ days
  6. Production deployment      → _____ hours

Total Lead Time = Sum of above: _____

Handoffs identified:     _____
Biggest wait step:       _______________
First automation target: _______________

DORA Tier (circle one):
  Elite / High / Medium / Low

What to do with this

Keep this baseline. By the end of the 35-day course you will have:

A working CI pipeline → reducing lead time
Automated tests → reducing change failure rate
Kubernetes deployments → faster MTTR via rollback
Monitoring → faster incident detection

Re-measure your DORA metrics on Day 35 and compare!

Tip: visit dora.dev

DORA provides a free Quick Check tool at dora.dev that surveys your team and automatically calculates your performance tier.

Knowledge Check

❓ Quiz Time

3 questions · 5 minutes · Instant feedback

Answer to unlock the next slide →

QUESTION 1 OF 3

What does the A stand for in the CALMS framework?

A

Agility

B

Automation

C

Assessment

D

Availability

QUESTION 2 OF 3

Which DORA metric measures how quickly your team recovers from a production failure?

A

Deployment Frequency

B

Lead Time for Changes

C

Change Failure Rate

D

Mean Time to Recover (MTTR)

QUESTION 3 OF 3

An Elite DevOps team (per DORA research) deploys to production how frequently?

A

Multiple times per day

B

Once per week

C

Once per sprint (2 weeks)

D

Once per month

Day 1 — Complete

What you learned today

⚡

The Problem

Dev wants speed. Ops wants stability. DevOps breaks the wall between them.

🏛

CALMS

Culture · Automation · Lean · Measurement · Sharing — the 5 DevOps pillars.

📊

DORA Metrics

4 metrics to measure delivery performance. Elite teams deploy multiple times/day.

∞

Infinity Loop

8-stage cycle: Plan → Code → Build → Test → Release → Deploy → Operate → Monitor.

Your Day 1 Action Items

Complete the DORA baseline template (if not done in lab)
Visit dora.dev and take the Quick Check
Read: "The Phoenix Project" — first 3 chapters (optional but powerful)

Tomorrow — Day 2

DevOps Toolchain Overview

We'll map the complete 8-stage infinity loop to real tools, install your DevOps toolkit (Git, Docker, VS Code), and set up your GitHub account for the rest of the course.

Git Docker VS Code Node.js GitHub Account