Bug hunting is often a long painful process usually involving terror, confusion, and disorder. If we can examine the bug hunting process at a more meta level, perhaps we can distill the process for finding and
ripping out bugs down to a set of key ideas that are reusable.
In this talk, I'll discuss a very painful two part bug that exists in both userland and the kernel. To make things more interesting, this bug only manifests itself on certain combinations of user programs and
kernel versions. In fact, it was so elusive that there are mailing list posts dating back to at least 2009 of people bumping into this bug, but not finding a solution.
In this talk, we'll be taking a low level tour of Linux networking and diving deep into device driver code, the device agnostic layer of Linux networking stack, protocol family code, and a very well-known, popular userland library. We'll examine how these pieces of the operating system are stitched together and discover why this bug was so elusive. I'll also be taking time to walk through my thought process, debugging tools, and the eventual solution that finally fixes this rather nasty bug.