As part of its Java-related projects, The Open Group - Grenoble Research Institute has been working with gdb-4.17 and the LinuxThreads package as delivered within the glibc-2.0.6 to enhance GDB to debug multithreaded applications. During this development, two kernel bugs (still unfixed in the 2.2.0) have been found. Currently, no serious bad effects on the system behaviour have been detected, but this is a potential source of process freezing and incorrect kernel results when debuggers are used.
Both bugs are only visible when a debugger (like GDB) dynamically attaches a process in order to debug it by calling "ptrace(PTRACE_ATTACH, ...)". A kernel side effect of this command is that the debugger becomes the current parent of this process and that its original parent no longer knows this child directly (It may only know it by scanning the task list and testing for each process the original parent). The original parent must become back the current parent when the debugger detaches the process (via a call to "ptrace(PTRACE_DETACH, ...)") or exits.
A debugger which attaches processes in this way, is intrusive in the kernel in the sense that it modifies the behaviour of the "wait()" family system calls for the original parent: The "sys_wait4()" kernel routine does not take into account the attached children of the original parent while parsing the children list, thus leading to potential incorrect "errno" and return values, if all its original children have been dynamicly attached by a debugger.
For example, if the original parent has only one child that is attached by a debugger, then a "wait()" family system call issued by the original parent will return -1 and "errno" set to "ECHILD" which is wrong since the child process does still exist and will be reattached correctly to its original parent when detached by the debugger or when the debugger exits.
When a debugger which attaches processes in this way is about to exit, it calls "exit_notify()" which one goal is to attach the child processes to a new parent process (usually the init task) in "forget_original_parent()". Unfortunately, this subroutine ignores dynamicly attached processes which have become children of the debugger, and forces the init task to become their original parent, which is wrong in this case. The result is that the original parent is usually blocked in "sys_wait4()", waiting (until a signal arrives) for its children which are no longer its own.
The basic idea of the fix is to create a per-process counter which contains the number of original alive children attached by a debugger or not. This counter is incremented during a successful "fork()" system call, and decremented at "exit()". This counter is modified under "tasklist_lock" in order to keep it coherent inside an SMP kernel. Of course, this patch is machine-independent and should work for all targets.
This new counter is mainly used during "wait()", where a local copy of this counter is decremented for all children which current parent is the original one. If the final value is zero at the end of the current children queue scanning, then all children processes have been scanned (there are no attached children); If still positive, then there are children that have been dynamically attached via the "ptrace()" system call. In the latter case, there is additional code that scans all the processes to find attached children in order to set up correctly the "errno" and return value of the system call.
The main difficulty is the management of the counter in an SMP-safe way in the "exit_notify()" static subroutine: This routine calls "forget_original_parent()" at its very beginning, and after, exclusively grabs the "tasklist_lock" in order to safely transfer its children to their new parent, which has been set up by "forget_original_parent()" to the init task under shared "tasklist_lock".
The "exit_notify()"code has been rewritten in order to update the counter ("nchildren") and the original parent pointer ("p_opptr") under exclusive "tasklist_lock". After all the current children have been transfered to their new current parent (the init task when the exiting process is the original parent), then the "nchildren" counter can be used in order to test whether there are remaining children dynamicly attached by a debugger via the "ptrace()" system call, and which original parent must be set to the init task. Such task will be achieved by the new code of the "forget_original_parent()" static subroutine, and in the case where such children are in zombie state, by notifying their new parent of their new status.
A patch file is currently available, depending of the Linux kernel targeted version. Unfortunately, the fix modifies the "struct task", thus requiring a recompilation of all the kernel modules depending on this structure after the patch is applied: