Debugging
Table of Contents
Debugging Process
(1) Describe the problem
- Write down the problem in a text file, bug tracker or on a piece of paper.
- Describe the expected result and the observed result.
- State all relevant facts which are necessary to enable someone else to reproduce the problem, e.g. the operating system and user interaction.
- If possible, there should be a minimal script with which the problem can be demonstrated.
- Collect all relevant log files and error messages.
(2) Create a concept diagram
Creating a concept diagram helps to get a grasp on all the moving parts involved in the problem.
(3) Trying stuff until it works
(Image source: Andreas Zeller, Scientific Method, Creative Commons 1.0)
- Analze the systems diagnostic data and its source code.
- Formulate a hypothesis ("an educated guess") about the infection chain.
- Based on this hypothesis, conduct an experiment to try out a possible step towards the solution.
- Based on the observed result, decide whether the hypothesis needs to be refined or rejected.
- Repeat these steps until the problem is resolved.
(4) Implement the solution
- Fix the root cause of the problem, not just the symptoms.
- If a workaround cannot is necessary, take it as a temporary solution and create a follow-up task.
- Take preventive measures (e.g. regression tests), so that this class of issue cannot resurface.
- Refactor the solution along with a clean code checklist.
Tipps and tricks
Scrutinize the error messages
- End user messages
- Stack traces
- Log files
- Try to write helper scripts for finding the relevant info, e.g. with Bash or Perl.
Search the internet
After 13/14 years of development, I still cannot recommend this book higher.
— James Higgins (@IamtheHiggster) 6. April 2016
"Googling the Error Message" on O RLY? pic.twitter.com/7cMbKd5rtf
- Type in the key parts of the error message the search function of the support forum or a search engine.
- There are also other search engines than Google.
- Stackoverflow alone contains a lot of answers.
Keep a debugging logbook
Everything gets written down, formally, so that you know al all times where you are, where you've been, where you're going, and where you want to get. In scientific work and electronics technology this is necessary because otherwise the problems get so complex you get lost in them and confused and forget what you know and what you don't know and have to give up. – Zen and the Art of Motorsycle Maintenance, Robert M. Pirsig
Figure 1: Don't get lost in the debugging maze
For subtle bugs in large programs, the amount of state you need to keep track of can rapidly get out of hand. (…) This is the point at which you should be writing down every single command you type in any relevant prompts, and every single code change (or, since we have technology, obsessively saving the output of `history`, making commits to test branches, and recording the correlation between them). – Nelson Elhage
Divide and conquer
- Create a new sandbox project.
- Extract the problematic part from the main project and reproduce the problem there.
- Remove all code which is not related the actual problem step by step.
- Then solve the problem.
- Integrate the solution in the main program.
Understand how the system works
- What components are involved in the problem and how do they interact?
- Create a model to reduce the cognitive load (e.g. UML).
- Reference to the architecture documentation (e.g. arc42).
- Then use this understanding to locate the problem origin.
- Read the documentation and manuals.
Figure 2: Example for System Analysis utils, by Brendan Gregg
Gather program state information
(Image source: Andreas Zeller, How failures come to be , Creative Commons 1.0)
- Print out statements to the standard output
- Add log statements
- Turn on debug output
- Use a debugger
- Monitoring tools like Kibana, Zipkin and Nagios
- "Systems programming as a swiss army knife" by Julia Evans - YouTube
Diff Debugging
- https://www.martinfowler.com/bliki/DiffDebugging.html
- Using git bisect | dev.to | Jason McCreary
- Git - git-bisect Documentation
- The log files for the related source code could also include interesting information.
Ask for help
- If you get stuck, ask your collegues for help.
- They might be happy to help you out.
- If you cannot find a solution together, consider posting the question on Stackoverflow.
Take a break
- go for a walk
- fresh air
- drink a coffee
- relax
- have a good nights sleep
Print out parts of the source code
- Write comments with a pen.
- Mark susspicious sections and sections you don't understand.
- Draw lines to visualize relations between parts of the system.
Break the Grodian knot
- You might break the "Grodian knot" by rewriting (parts of) the code.
Debugging stories
Whether you've been programming for a month or for as long as John, we've all had days like this 🙂👍 https://t.co/lfh7MSBoj2
— Programming Wisdom (@CodeWisdom) 3. Mai 2017
Me debugging python today: pic.twitter.com/meZwfmsIY9
— Mike Donnelly (@SQLMD) 24. Mai 2017
- The Practical Dev on Twitter: "So you've discovered a bug… https://t.co/SmU7MHcDL7"
- Experience: when trying to fix one problem another problem re-surfaces. This process goes on recursively.
- Possible solution: Don't leave broken windows unrepaired. Refactoring. Tag tasks with housekeeping. Boy scout rule.
- Debugging Stories with Haseeb Qureshi | SE Daily
- The case of the 500-mile email
- Debugging behind the Iron Curtain
- My Hardest Bug Ever
- SE-Radio Episode 282: Donny Nadolny on Debugging Distributed Systems : Software Engineering Radio
- Debugging story for on issue in PagerDuty
- IEEE Software Blog: How Cross-stack Configuration Errors can Ruin a 360 Degree Panorama Website
- Analysis of a configuration mistake which impacted a WordPress website
- GOTO 2017 • Debugging Under Fire: Keep your Head when Systems have Lost their Mind • Bryan Cantrill - YouTube
- The Practical Dev on Twitter: "Tell us about one of the most frustrating bugs you've encountered! #devdiscuss"
- Gregor on Twitter: "I spent the hole day on a semi-complicated codebase trying to figure out a bug, but no luck. It happens, no matter your experience :)"
- blog dds: 2017.10.03 - An Embarrassing Failure
- blog dds: 2017.09.05 - Of BOOL and stdbool
- SE-Radio Episode 284: John Allspaw on System Failures: Preventing, Responding, and Learning From : Software Engineering Radio
- Love your bugs - Allison Kaptur
- Two bugs she encountered when working for Dropbox
- Tips for better debugging
- Debugging an evil Go runtime bug - marcan.st
- ⚡Julia Evans⚡ on Twitter: "Why it's important to love your bugs (and how I got better at debugging) https://t.co/5rUlZ9Ahm5"
Incidence reports
- cloud.gov Status - Applications on the 18f.gov domain were unavailable from 5:35pm ET Saturday December 24th to 12:40am Sunday December 25th
- An Incorrect Command Entered By Employee Triggered Disruptions To S3 Storage Service, Knocking Down Dozens of Websites, Amazon Says - Slashdot
- RESOLVED: Current account payments may fail - Major Outage (27/10/2017) - The Current Account - Monzo Community
- A hacker stole $31M of Ether — how it happened, and what it means for Ethereum
References
Effective Troubleshooting
Google - Site Reliability Engineering : https://landing.google.com/sre/book/chapters/effective-troubleshooting.html
Code Complete#Debugging
Why Programs Fail
TRAFFIC algorithm
(T)rack: The first step of the debugging algorithm is to track the problem report in the bug tracker database. (R)eproduce: The problem needs to be reproducible to the developers in order to get fixed. (A)utomate: The steps to reproduce should be optimized into a minimal test case which can be executed automatically. (F)ind: Now that we can reproduce the problem fastly, we need to find the origins of the defect by tracking the origins of values. (F)ocus: In the search of the problem origins we should focus on the usual suspects, e.g. code smells, known infections and earlier defects. (I)solate: The next step is to isolate the root cause of the infection with the help of the scientific method. (C)orrection: Finally we can fix the defect and re-run all the tests for verification.
Effective Debugging
- Effective Debugging: 66 Specific Ways to Debug Software and Systems | InformIT
- General-Purpose Methods and Practices | Item 9: Set Yourself Up for Debugging Success | InformIT
- blog dds: 2017.08.15 - Debugging in Practice: dgsh Issue 85
Often, debugging consumes most of a developer’s workday, and mastering the required techniques and skills can take a lifetime.
https://www.safaribooksonline.com/library/view/effective-debugging-66/9780134394909/
Debug It!
Articles and podcasts
- Siddharth on Twitter: "“Fixing is not Patching, it is eliminating the root cause..!!” by @nalinikanth https://t.co/VxSVO5yfmM"
- debugging-zine.pdf
- A Programmer's Guide To Effective Debugging - Simple Programmer
- SE-Radio Episode 282: Donny Nadolny on Debugging Distributed Systems : Software Engineering Radio
- Debugging – Andrew J. Ko
- Debugging Your Crashes with Ben Curtis | Software Engineering Daily | SE Daily
- Solving the puzzle of debugging microservices | ePages Developer Portal