Desktop application development

Application crash debugging with WinDBG

By François Charron,

July 23, 2015.

Introduction My last article (Introduction to WinDBG) discussed the various prerequisites necessary to use the WinDBG debugger effectively. In order to get the most out of the article below, it is important to understand the information that was explained in my previous article. This article details in more detail the analysis of a particular type of bug: application crash.

Introduction

My last article (Introduction to WinDBG) discussed the various prerequisites necessary to use the WinDBG debugger efficiently. In order to get the most out of the article below, it is important to understand the information that was explained in my previous article.

This article details in more detail the analysis of a particular type of bug: application crash.

Before going deeper into the details of this type of debugging, it is important to clarify what an application crash is. An application crash is characterized by the complete stopping of a process due to an unrecoverable error. The error can come from a variety of sources:

une exception lancée par une routine qui ne peut être traitée par aucun des appelants (angl. Unhandled exception),
une instruction illégale commandée par le programme (ex : division par zéro),
une tentative d'accès à de la mémoire protégée (ex : déréférencement d'un pointeur non initialisé, angl. Access violation).

D'autres types d'erreurs qui préviennent le déroulement normal d'une application ne consistant pas en un plantage d'application à proprement parler :

application hang is an indefinite suspension of a program (which remains alive despite everything), caused by entering an infinite loop or by a deadlock between two threads, among other things,
the system hang is the absolute non-response of the entire operating system to user interactions, with causes similar to those of the application hang, but within a kernel process (such as a device driver),

System crash is the complete shutdown of the operating system, usually followed by an automatic reboot of the operating system, following an unrecoverable error in a kernel process. At one time, Windows displayed a blue screen characteristic of this type of error, which earned it the nickname Blue Screen of Death (BSOD).

Each of these types of problems requires a different approach in its analysis, and will require specific coverage.

Note

As mentioned in my previous article, it is possible to investigate a bug in WinDBG either directly (live) or post-mortem (crash dump). Unless otherwise stated, the techniques detailed below are applicable to both techniques.

Finding the culprit

When debugging an application crash, the first step is to find out which thread caused the problem. This is done by visually inspecting the call stack of each thread to detect some typical patterns.

As with most WinDBG features, there are several ways to achieve a similar result depending on the technique used. Here are three of them that allow to visualize the call stacks of the different threads according to the context:

Using the graphical interface
1. Display the "Call Stack" window either by clicking on the corresponding button, pressing [ALT+6] or by clicking on the "Call Stack" item in the "View" menu.
2. Display the "Processes and Threads" window either by clicking on the corresponding button, pressing [ALT+9] or clicking on the "Processes and Threads" item in the "View" menu.
3. Click on the TID (Thread identifier) corresponding to the thread for which you want to see the call stack.
Using the k command to display the call stack of a thread with a quantity of configurable details
- ~*kb allows to visualize the call stack of all threads of the process with a reasonable amount of additional information
With the !uniqstack metacommand which displays all the call stacks of different threads. In other words, this command hides call stacks that are identical for more than one thread. This is very useful when our application contains a large number of threads waiting for a command that will all be in the same state most of the time.

Once you have obtained the list of call stacks, you must now identify which thread is responsible for the problem. Depending on the circumstances surrounding the bug, a crash call stack can vary quite a bit from bug to bug. Here are several common examples of possible patterns.

Untreated exception

KERNELBASE!RaiseException             <--- Information importante
VCRUNTIME140D!CxxThrowException       <--- Information importante
crashTest!testExceptionNonTraitee 
crashTest!fonction1 
crashTest!wmain 
crashTest!invoke_main 
crashTest!__scrt_common_main_seh 
crashTest!__scrt_common_main 
crashTest!wmainCRTStartup 
KERNEL32!BaseThreadInitThunk 
ntdll!RtlUserThreadStart

As can be seen in this example, the main function of our program called function1(), which then called testExceptionNonTraitee(), which threw an exception using the C++ throw keyword (CxxThrowException), which finally resulted in calling the operating system kernel (RaiseException). The lack of a try-catch couple covering the function calls resulting in the exception throw caused the program to stop.

Exception camouflaged by a dialog box

(call stack simplifié pour raison de lisibilité)

USER32!NtUserWaitMessage
USER32!DialogBox2                 <--- Information importante
USER32!InternalDialogBox    
USER32!SoftModalMessageBox
USER32!MessageBoxWorker
USER32!MessageBoxTimeoutW
USER32!MessageBoxW                <--- Information importante
[...]
ntdll!RtlRaiseException
KERNELBASE!RaiseException
VCRUNTIME140D!_CxxThrowException
crashTest2!CcrashTest2Dlg::OnBnClickedButton1
[...]
crashTest2!wWinMainCRTStartup
KERNEL32!BaseThreadInitThunk
ntdll!RtlUserThreadStart

If we attach ourselves to an application that already displays an operating system dialog telling us that a crash has occurred, or if we receive a crash dump from an application collected when the dialog was visible, the exception may be buried under an impressive list of function calls leading to the display of the dialog. It is also possible that a combination between this pattern and the exception pattern whose cause is hidden (next pattern) helps to hide the source of the problem. It is easy, especially when you have to browse through hundreds of different threads, to go over this kind of call stack and wrongly assume that it is a simple user interface thread. However, when debugging what is known to be an application crash, the presence of function calls related to dialog boxes (MessageBoxW, DialogBox2, ...) should make us aware of the problem. By inspecting the call stack more closely, we can detect the presence of a throw emanating from the code by calls to CxxThrowException and RaiseException. Another function that sometimes indicates the throw of an exception is kiUserExceptionDispatcher.

Exception whose cause is hidden

ntdll!NtWaitForMultipleObjects
KERNELBASE!WaitForMultipleObjectsEx
KERNEL32!WerpReportFaultInternal
KERNEL32!WerpReportFault
KERNELBASE!UnhandledExceptionFilter
ntdll!RtlUserThreadStart$filt$0
ntdll!_C_specific_handler
ntdll!RtlpExecuteHandlerForException
ntdll!RtlDispatchException
ntdll!KiUserExceptionDispatch
WARNING: Stack unwind information not available. Following frames may be wrong. 
[...]

For various reasons beyond the scope of this article, sometimes, especially when performing a crash dump analysis of an application crash, it is easy to identify which thread is responsible for the crash, but the underlying cause of the bug is hidden by an incomplete call stack. This can happen for example if you use an external library without debugging symbols. It is sometimes possible to find the missing part of the call stack by using the context of the exception that is saved on the stack at the time the call stack is launched.

To do this, we need to use a series of commands that will give us the necessary information step by step.

Make sure the correct thread is active. To do this, simply display the call stack of the current thread and make sure (for example with the kb command) that it contains calls to the various functions related to exception handling (KiUserExceptionDispatch, RtlDispatchException, RaiseException, ...). If this is not the case, just display the "Processes and Threads" window by clicking its button, pressing [ALT+9] or clicking "Processes and Threads" in the "View" menu, and click on the TID of the desired thread.
Once in the right thread, the !teb metacommand displays the Thread Environment Block of the thread. This data structure describes different parameters of the thread, such as the limits in the memory address space of its stack. Here is what the result of the !teb command looks like:

TEB at 00007ff7b5b7e000
ExceptionList: 0000000000000000
StackBase: 0000002419d50000            <--- Information importante
StackLimit: 0000002419d4d000           <--- Information importante
SubSystemTib: 0000000000000000
FiberData: 0000000000001e00
ArbitraryUserPointer: 0000000000000000
Self: 00007ff7b5b7e000
EnvironmentPointer: 0000000000000000
ClientId: 000000000000f6d4 . 0000000000010270
RpcHandle: 0000000000000000
Tls Storage: 00007ff7b5b7e058
PEB Address: 00007ff7b5b7c000
LastErrorValue: 0
LastStatusValue: c00000bb
Count Owned Locks: 0
HardErrorMode: 0

The members we are currently interested in are "StackLimit" and "StackBase", which both correspond to the limit addresses of the memory area occupied by the current thread stack.

The dps command (or dds if you are debugging a 32-bit application) then displays the entire contents of the current thread stack. The result is usually very long and looks like this:

0:000> dps 2419d4d000 2419d50000
[...]
00000024`19d4fee8 00007ff7`b64a208e crashTest!__scrt_common_main+0xe [f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl @ 309]
00000024`19d4fef0 00000000`00000000
00000024`19d4fef8 00000000`00000000
00000024`19d4ff00 00000000`00000000
00000024`19d4ff08 00000000`00000000
00000024`19d4ff10 00000000`00000000
00000024`19d4ff18 00007ff7`b64a2359 crashTest!wmainCRTStartup+0x9 [f:\dd\vctools\crt\vcstartup\src\startup\exe_wmain.cpp @ 17]
00000024`19d4ff20 00000000`00000000
00000024`19d4ff28 00000000`00000000
00000024`19d4ff30 00000000`00000000
00000024`19d4ff38 00000000`00000000
00000024`19d4ff40 00000000`00000000
00000024`19d4ff48 00007fff`a44813d2 KERNEL32!BaseThreadInitThunk+0x22
00000024`19d4ff50 00000000`00000000
00000024`19d4ff58 00000000`00000000
00000024`19d4ff60 00000000`00000000
00000024`19d4ff68 00000000`00000000
00000024`19d4ff70 00007ff7`b64a100f crashTest!ILT+10(wmainCRTStartup)
00000024`19d4ff78 00007fff`a6e15444 ntdll!RtlUserThreadStart+0x34
00000024`19d4ff80 00007fff`a44813b0 KERNEL32!BaseThreadInitThunk
00000024`19d4ff88 00000000`00000000
00000024`19d4ff90 00000000`00000000
00000024`19d4ff98 00000000`00000000
00000024`19d4ffa0 00000000`00000000
00000024`19d4ffa8 00007fff`a4441ad0 KERNELBASE!UnhandledExceptionFilter   <--- Information importante
00000024`19d4ffb0 00000024`19d4ea80                                       <--- Information importante
00000024`19d4ffb8 00000024`19d4ea80
00000024`19d4ffc0 00000000`00000000
00000024`19d4ffc8 00000000`00000000
00000024`19d4ffd0 00000000`00000000
00000024`19d4ffd8 00000000`00000000
00000024`19d4ffe0 00000000`00000000
00000024`19d4ffe8 00000000`00000000
00000024`19d4fff0 00000000`00000000
00000024`19d4fff8 00000000`00000000
00000024`19d50000 00000020`78746341                                       <--- Début du stack

Fortunately, the data we are interested in is usually placed at the very bottom of the stack. By performing either a visual or textual search (with [CTRL+F]) in the result of the dps command, you just have to find the UnhandledExceptionFilter function in the stack.

The left column is the address of each value in the stack and the right column is the content of the memory address, you have to take note of the value just below the line where you find UnhandledExceptionFilter. This corresponds to the parameter passed to the function when it is called. A quick search on the Internet will show that the type of parameter passed to UnhandledExceptionFilter is EXCEPTION_POINTERS.

In order to see the detailed content of this data structure, let's use the dt -b EXCEPTION_POINTERS command to force the debugger to interpret the parameter as the address of an instance of EXCEPTION_POINTERS :

0:000> dt -b EXCEPTION_POINTERS 2419d4ea80
crashTest!EXCEPTION_POINTERS
+0x000 ExceptionRecord : 0x00000024`19d4f6b0     <--- Enregistrement de l'exception
+0x008 ContextRecord : 0x00000024`19d4f1c0       <--- Contexte de l'exception

The .exr command displays detailed information about the exception :

0:000> .exr 0x00000024`19d4f6b0
ExceptionAddress: 00007ff7b64a1929 (crashTest!testA::setMember+0x0000000000000039)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000001
Parameter[1]: ffffffffffffffff
Attempt to write to address ffffffffffffffff

We can see here that the exception was thrown from the crashTest application in the setMember method of the testA class. The result of the command also shows that the error is an attempt to write illegally to protected memory at address 0xffffffffffffffffffffffff (access violation).

Finally, the .cxr command allows us to load the context that was active at the time the exception was thrown and thus to inspect the call stack at that moment of execution :

0:000> .cxr 24`19d4f1c0
rax=ffffffffffffffff rbx=00007ff7b64a100f rcx=0000000002a62b1c
rdx=0000000002a62b1c rsi=00007ff7b5b7c000 rdi=0000002419d4f998
rip=00007ff7b64a1929 rsp=0000002419d4f8d0 rbp=0000002419d4f8d0
r8=0000002419e0c820 r9=0000000000000000 r10=0000000000000000
r11=0000002419d4fc60 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz na pe nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202
crashTest!testA::setMember+0x39:
00007ff7`b64a1929 8908 mov dword ptr [rax],ecx ds:ffffffff`ffffffff=????????

In addition to displaying this cryptic information, WinDBG automatically displays the call stack that was active at the time the exception was thrown in the Call Stack window. On the other hand, if it is able to find the source file information as well as the location of the source file itself, WinDBG will take care of loading the content of the source file in a new window. The call stack displayed is now :

crashTest!testA::setMember
crashTest!testExceptionNonTraitee
crashTest!fonction2
crashTest!fonction1
crashTest!wmain
crashTest!invoke_main
crashTest!__scrt_common_main_seh
crashTest!__scrt_common_main
crashTest!wmainCRTStartup
KERNEL32!BaseThreadInitThunk
ntdll!RtlUserThreadStart

et une fenêtre de code source nous affiche maintenant l'endroit précis où l'exception s'est produite :

void setMember(int val) { m_iVal = val; }

Un coup d'oeil à la fenêtre « Locals » nous indique aussi que la valeur du pointeur this est de 0xffffffffffffffff, ce qui concorde avec la cause de l'exception déterminée au point 5.

Peu importe le pattern, une fois la cause du bogue identifiée, il faut maintenant penser à un correctif à apporter au code pour éviter que cela ne se reproduise. Le débogueur ne peut malheureusement pas vous aider pour cette partie du travail.

Conclusion

Comme vous avez pu le constater, une séance de débogage avec WinDBG n'est pas une mince affaire. Pour les cas relativement simples, il peut être sage d'essayer tout d'abord son débogueur habituel avant de faire le saut vers WinDBG.

Cependant, pour des cas plus complexes comme le pattern d'exception dont la cause est cachée, il est possible d'aller chercher de l'information beaucoup plus détaillée avec WinDBG qu'il n'est possible de le faire avec Visual Studio.