Table of Contents

Debugging Process

(1) Describe the problem

  • Write down the problem in a text file, bug tracker or on a piece of paper.
  • Describe the expected result and the observed result.
  • State all relevant facts which are necessary to enable someone else to reproduce the problem, e.g. the operating system and user interaction.
  • If possible, there should be a minimal script with which the problem can be demonstrated.
  • Collect all relevant log files and error messages.

(2) Create a concept diagram

Creating an informal concept diagram helps like a roadmap, not to get lost.

  • Which technologies are involved?
  • What components are involved in the problem and how do they interact?
  • How are they interconnected?
  • In which way should it be working?
  • In which way does it work?
  • What is the infection chain?

(3) Trying stuff until it works

The actual debugging work is a trail+error series based on educated guesses about failure chain, i.e. hypothesis based experiments.

  • Collect ideas for experiments on small note sheets and pin them on the wall, like in movies with criminal investigations
  • Each experiment with "an educated guess" should create some learnings.
  • Those learnings can create ideas for new experiments.
  • Search for hints. Write down all learnings, so that they don't get lost.

(4) Implement the solution

  • Try to fix the root cause of the problem, not just the symptoms.
  • Try to take preventive measures (e.g. regression tests), so that this class of issue cannot resurface.

Tipps and tricks

Keep a debugging logbook

Everything gets written down, formally, so that you know al all times where you are, where you've been, where you're going, and where you want to get. In scientific work and electronics technology this is necessary because otherwise the problems get so complex you get lost in them and confused and forget what you know and what you don't know and have to give up. – Zen and the Art of Motorsycle Maintenance, Robert M. Pirsig


Figure 1: Don't get lost in the debugging maze

For subtle bugs in large programs, the amount of state you need to keep track of can rapidly get out of hand. (…) This is the point at which you should be writing down every single command you type in any relevant prompts, and every single code change (or, since we have technology, obsessively saving the output of `history`, making commits to test branches, and recording the correlation between them). – Nelson Elhage

Go down the rabbit hole

Knowledge of the system typically limits the effectiveness of an SRE new to a system; there’s little substitute to learning how the system is designed and built. - Chris Jones

… my sister said that my biggest exceptional quality was that I would not let go. - Linus Torvalds


Figure 2: Example for System Analysis utils, by Brendan Gregg

Improve the error handling

  • Add state assertions at the beginning and end of functions
  • Write good error messages which are telling what the problem is, how it has happened, and what can be done to fix it.
  • Add log messages

Search the internet

Increase the log level

  • Set loggers to DEBUG mode
  • Set "–debug" flag etc. for command line tools

Ask for help

  • If you get stuck, ask your collegues for help.
  • They might be happy to help you out.
  • If you cannot find a solution together, consider posting the question on Stackoverflow.

Using a debugger

Diff Debugging

Take a break

  • walk away for two minutes
  • work on different task
  • go home, try again next morning
  • go for a walk
  • fresh air
  • relax
  • sleep

Create a new sandbox project

  • Extract the problematic part from the main project and reproduce the problem there.
  • Remove all code which is not related the actual problem step by step.
  • Then solve the problem.
  • Integrate the solution in the main program.

Cut the Gordian knot

You might cut the "Gordian knot" by rewriting (parts of) the code.

Debugging stories


Why Programs Fail

TRAFFIC algorithm

(T)rack: The first step of the debugging algorithm is to track the problem report in the bug tracker database.

(R)eproduce: The problem needs to be reproducible to the developers in order to get fixed.

(A)utomate: The steps to reproduce should be optimized into a minimal test case which can be executed automatically.

(F)ind: Now that we can reproduce the problem fastly, we need to find the origins of the defect by tracking the origins of values.

(F)ocus: In the search of the problem origins we should focus on the usual suspects, e.g. code smells, known infections and earlier defects.

(I)solate: The next step is to isolate the root cause of the infection with the help of the scientific method.

(C)orrection: Finally we can fix the defect and re-run all the tests for verification.

Debugging by Thinking

Robert Metzger

The Art of Debugging with GDB, DDD, and Eclipse