Debugging GCC¶
The gcc
binary is actually a relatively small “driver” program, which
parses some command-line options, and then invokes one or more other
programs to do the real work.
Consider compiling a simple hello world C program:
#include <stdio.h>
int main (int argc, const char *argv[])
{
printf ("Hello world\n");
return 0;
}
to generate an a.out
binary:
$ gcc hello.c
$ ./a.out
Hello world
Internally, the driver will invoke cc1
(the C compiler), which
converts the .c code to a .s assembler file. Assuming this succeeds the
driver will typically then invoke as
(the assembler), then the linker.
Given that, how do we debug the C compiler? The easier way is to add
-wrapper gdb,--args
to the gcc command-line:
# Invoke "cc1" (and "as", etc) under gdb:
$ gcc hello.c -wrapper gdb,--args
The gcc
driver will then invoke cc1
under gdb
, and you can
set breakpoints, and step through the code.
Note
If you ever need to debug the driver itself, you can simply run it under gdb in the normal way:
# Invoke the "gcc" driver under gdb: $ gdb --args gcc hello.c
I find myself doing this much less frequently than the
-wrapper gdb,--args
invocation for debugging cc1
though.
You can invoke other debugging programs this way, for example, valgrind:
# Invoke "cc1" (and "as", etc) under valgrind:
$ gcc hello.c -wrapper valgrind
Note
For good results under valgrind, it’s best to configure your build
of gcc with --enable-valgrind-annotations
, which automatically
suppresses various known false positives.
Support scripts for gdb
¶
The source tree contains two support scripts that significantly improve
the debugging experience within gdb
, but some setup is required.
gcc/configure
(from configure.ac
) automatically generates a
.gdbinit
within the gcc
subdirectory of the build directory,
and when run by gdb
.
This should be automatically detected and run by gdb. However, you may see a message from gdb of the form:
"path-to-build/gcc/.gdbinit" auto-loading has been declined by your `auto-load safe-path'
- as a protection against untrustworthy python scripts. See
http://sourceware.org/gdb/onlinedocs/gdb/Auto_002dloading-safe-path.html
The fix is to mark the paths of the build/gcc
directory as trustworthy.
An easy way to do so is by adding the following to your ~/.gdbinit
script:
add-auto-load-safe-path /absolute/path/to/build/gcc
for the build directories for your various checkouts of gcc.
If it’s working, you should see the message:
Successfully loaded GDB hooks for GCC
as gdb starts up.
The generated .gdbinit
script loads two files:
gcc/gdbinit.in contains useful commands in
gdb
’s own language, sets up useful breakpoints, and skipping of some very heavily-used inline functions.gcc/gdbhooks.py injects useful Python code into gdb, for pretty-printing important data types.
See the links above for more information.
How do I find where a particular tree was created?¶
If you have a tree node, you can put a watchpoint on the memory location representing its tree code. This will trigger as the tree node is created, which can be helpful for detecting, say, where in a front-end something is built. The memory location might be modified a few times before the node is allocated.
For example, when tracking down where a particular IDENTIFIER_NODE
was built (to fix a bogus suggestion in the C++ frontend):
(gdb) p suggestion
$5 = <identifier_node 0x7ffff0d10600 ._61>
(gdb) p suggestion->base.code
$6 = IDENTIFIER_NODE
Here I put the watchpoint on it:
(gdb) watch -l suggestion->base.code
Hardware watchpoint 10: -location suggestion->base.code
On re-running, it takes a few writes before we hit the creation of
the IDENTIFIER_NODE
:
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
[...snip...]
Hardware watchpoint 10: -location suggestion->base.code
Old value = <unreadable>
New value = 2947526575
memset () at ../sysdeps/x86_64/memset.S:69
69 movdqu %xmm8, -16(%rdi,%rdx)
(gdb) cont
Continuing.
Hardware watchpoint 10: -location suggestion->base.code
Old value = 2947526575
New value = ERROR_MARK
memset () at ../sysdeps/x86_64/memset.S:69
69 movdqu %xmm8, -16(%rdi,%rdx)
(gdb) cont
Continuing.
Hardware watchpoint 10: -location suggestion->base.code
Old value = ERROR_MARK
New value = IDENTIFIER_NODE
make_node (code=IDENTIFIER_NODE) at ../../src/gcc/tree.c:1035
1035 switch (type)
At this point, we can examine the backtrace and see what created the node.
Similar techniques can be used to track down where gimple statements are created, and so on.
Where in the user’s source is this location_t?¶
GCC uses location_t to track locations in the user’s source code. This data type is effectively a key into a database, and due to the need to pack information into a limited number of bits is encoded in a non-trivial way.
A handy trick for debugging locations is to inject a call to inform in the debugger, which emits a note diagnostic at a particular location_t:
(gdb) call inform (loc, "")
test.c: In function ‘fn_1’:
test.c:15:7: note:
15 | if (flag)
| ^~~~
A couple of caveats with this:
the diagnostics subsystem doesn’t print the source code if it was the same location_t as the last time a diagnostic was emitted
the diagnostic subsystem is not re-entrant, so you can’t use this when you’re inside the diagnostic emission code
TODO:
howto: stepping through the compiler, stepping through a pass
talk about dumpfiles also