This article tells a story about one bug. The bug escaped from the hard jail of code review by experienced engineers, the bug not found during the long stage of testing on a special 24/7 testing farm. The bug found with one holy flash stick.
Sometimes systems programmers go crazy and do weird things. In that case we needed to add some new GFP flag to the Linux kernel memory manager. If you know this field you are likely thinking now that we did something strange and not the correct way, but trust me, invention of a new GFP flag was necessary to finish our non-ordinary project in time.
Such flag was used during allocations in a special driver performing one totally crazy activity related to memory management and networking (try to imagine how these two fields were joined in our development). Huge code base, many corner cases, low-level code changing core parts of the kernel, unusual hardware, unstable kernel and distribution (most of system's parts were only initially ported to the device), a number of hardware bugs in pre-production hardware samples. Debugging of such solution looks crazy, but we've found ways to manage such complexity and prepared a huge testing stand with real hardware and virtual machines with different CPU architectures.
Day by day our solution improved, at some milestone all serious hardware bugs were fixed, software became more stable and we achieved some point of stability. The testing farm didn't "joy" us with new critical issues and we started to be bored.
We had achieved some stage of total integration and I was sent to the official journey, to the headquarters. Nobody expected any problems during integration of the solution, but... at some moment one of my colleagues brought a flash stick with some hardware problem and tried to connect it to the device. Kernel panic, almost immediately!
How? At this point all the platform was stable enough to have such huge and crazy problems. Just plug in the flash stick and see how everything goes crazy.
We started to research this strange bug, to plug in various flash drives, but the only one broke the system every time. After deep analysis of situation and debugging for two hours we've found the root of the problem. What a shame! The problem was in our code, but it was not a bug, it was an integration problem.
To understand the reason of the issue I'll provide some technical information. Let's start from GFP flags. These flags are used by Linux memory manager to control allocations' behavior. They are defined in the file 'include/linux/gfp.h'. This is not recommended to change their definitions or add some new ones without the need.
If you add some new one you should find non-occupied bits in 32-bit word and select one of them to be the flag. Then you write a preprocessor macro definition.
12 /* Plain integer GFP bitmasks. Do not use this directly. */
13 #define ___GFP_DMA 0x01u
14 #define ___GFP_HIGHMEM 0x02u
15 #define ___GFP_DMA32 0x04u
16 #define ___GFP_MOVABLE 0x08u
17 #define ___GFP_WAIT 0x10u
18 #define ___GFP_HIGH 0x20u
19 #define ___GFP_IO 0x40u
20 #define ___GFP_FS 0x80u
21 #define ___GFP_COLD 0x100u
22 #define ___GFP_NOWARN 0x200u
23 #define ___GFP_REPEAT 0x400u
24 #define ___GFP_NOFAIL 0x800u
25 #define ___GFP_NORETRY 0x1000u
26 #define ___GFP_MEMALLOC 0x2000u
27 #define ___GFP_COMP 0x4000u
28 #define ___GFP_ZERO 0x8000u
29 #define ___GFP_NOMEMALLOC 0x10000u
30 #define ___GFP_HARDWALL 0x20000u
31 #define ___GFP_THISNODE 0x40000u
32 #define ___GFP_RECLAIMABLE 0x80000u
33 #define ___GFP_KMEMCG 0x100000u
34 #define ___GFP_NOTRACK 0x200000u
35 #define ___GFP_NO_KSWAPD 0x400000u
36 #define ___GFP_OTHER_NODE 0x800000u
37 #define ___GFP_WRITE 0x1000000u
38 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
You also have to modify the definition __GFP_BITS_SHIFT incrementing the value by one.
102 #define __GFP_BITS_SHIFT 25 /* Room for N __GFP_FOO bits */
103 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
Simply. Next you change behavior of the allocator the way you need for cases when the flag is specified in gfp_mask.
As you can see in the line 102 above there is enough place to add some new flags, because 25 is smaller than 32. O.K., not so fast, there is some other place where the kernel code defines a number of other flags we have to consider in the same bit-space. This is a file 'include/linux/pagemap.h'.
18 /*
19 * Bits in mapping->flags. The lower __GFP_BITS_SHIFT bits are the page
20 * allocation mode flags.
21 */
22 enum mapping_flags {
23 AS_EIO = __GFP_BITS_SHIFT + 0, /* IO error on async write */
24 AS_ENOSPC = __GFP_BITS_SHIFT + 1, /* ENOSPC on async write */
25 AS_MM_ALL_LOCKS = __GFP_BITS_SHIFT + 2, /* under mm_take_all_locks() */
26 AS_UNEVICTABLE = __GFP_BITS_SHIFT + 3, /* e.g., ramdisk, SHM_LOCK */
27 AS_BALLOON_MAP = __GFP_BITS_SHIFT + 4, /* balloon page special map */
28 };
These flags are used in the field flags of the structure address_space and as you can see they share the same bit space, frankly speaking they have to fit in the same 32-bits word with GFP flags. The conclusion is that we really have 30 bit positions occupied and only two ones available for our new self-invented flags. Not bad too, so it is allowed to increment the value __GFP_BITS_SHIFT and define own flag (with the value 0x2000000u, for example). In that case all the mapping flags should be shifted left by one.
And it was done. But...
Small and huge hardware vendors using the Linux kernel in their products are similar in one habit - they like to modify the kernel in various unpredictable places to fix some problems the most easy way. Local kernel branches are full of various speed-hacks and workarounds. Many strange functions developed locally to resolve some local problems and satisfy specific hardware needs. And such change was the corner stone.
Some engineer in another branch has added a set of new mapping_flags to the ones shown above. So, after modification we had no room for new GFP flag and this fact was unknown. After integration of all the mess of patches got from different teams spread all of the world the number of bits used for GFP and mapping flags exceeded the boundary 32. And... almost all the time the system worked correctly. Why?!
There is a simple explanation of that fact. These new mapping flags were unused almost all the time. But one of them was introduced especially for marking address_space structures associated with devices having bad blocks. It is time for our magical flash stick. Operations with the flash stick were performed using mapping of files, thus some address_space structures were created to represent the physical part of the mapping (if you are not familiar with such machinery, read something about Linux page cache and don't be confused with the name of this structure). Operation of the bad block caused setting the corresponding mapping flag out of the size of 32-bits word.
Just look to the generic implementation of the function set_bit() (include/asm-generic/bitops/atomic.h):
65 static inline void set_bit(int nr, volatile unsigned long *addr)
66 {
67 unsigned long mask = BIT_MASK(nr);
68 unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
69 unsigned long flags;
70
71 _atomic_spin_lock_irqsave(p, flags);
72 *p |= mask;
73 _atomic_spin_unlock_irqrestore(p, flags);
74 }
It works not only for unsigned long fields, it works for larger bit maps. Much larger. So, if you try to set the bit number 32, really you set the first bit of the next unsigned long field, located just after the specified one.
So, we have a case of corruption, corruption of the neighbour structure field.
This way old and broken flash drive saved us from leak of a serious bug to production. It was quickly fixed and all the code related to the project was exposed for a new round of heavy review.
Sometimes integration bugs can be very and very harmful. Be sure that you know everything about your part of the software as good as all the solutions laying around it.
Happy hacking!