Using Python to debug C and C++ code (using gdb)

David Malcolm, Red Hat

dmalcolm@redhat.com

PyCon US 2011

These slides can also be seen via

http://tinyurl.com/PyCon-US-2011-GdbPython

Overview

Prerequisites

I'm going to assume basic familiarity with Python, and with either C or C++

Hopefully you've used gdb at least once.

You need gdb 7.0 or later, built with Python embedding enabled.

Why I love this technology

As it happens, the crashing program was itself in Python

Python saves the day

Interactive Python within gdb

The gdb module is built in

Use help if you get lost:

    (gdb) python help(gdb)
          
    Help on package gdb:
    
    NAME
        gdb
    
    FILE
        (built-in)
    
    PACKAGE CONTENTS
        (..etc..)
            

Example

Here's the C code I had to debug:

            static PyObject *interned; /* actually a PyDictObject */

            typedef struct _dictobject PyDictObject;

            struct _dictobject {
                PyObject_HEAD

               /* (Fields snipped for simplicity) */

               /* Something within here was being
                  corrupted: */
               PyDictEntry *ma_table;
            };
          

Looking up data

What's the type of the data?

Getting at an underlying pointer value

Use long to extract a pointer value from a gdb.Value:

            (gdb) python print hex(long(val))
            0x602590
          

Not to be confused with the address of the gdb.Value wrapper within the gdb process:

            (gdb) python print repr(val)
            <gdb.Value object at 0x7f52bf44bdc0>
          

Casts and pointers

gdb.lookup_type(TYPE_NAME)

In Python terms, first we need the type:

            (gdb) python \
            type_dict_ptr = \
            gdb.lookup_type('PyDictObject').pointer()

            (gdb) python print type_dict_ptr
            PyDictObject *
          

Note how we used gdb.Type.pointer

gdb.Value.cast(TYPE)

Now we can cast val:

            (gdb) python val2 = val.cast(type_dict_ptr)
            (gdb) python print val2
            0x602590
            (gdb) python print val2.type
            PyDictObject *
          

So val2 is equivalent to ((PyDictObject*)interned)

Looking up fields of a structure

Treat a gdb.Value as a dictionary to get at the fields of the underlying data:

            (gdb) python val3 = val2['ma_table']
            (gdb) python print val3
            0x7ffff7f28010
            (gdb) python print val3.type
            PyDictEntry *
          

So we now have val3, equivalent to:

            ((PyDictObject*)interned)->ma_table
          

The easier way

I just showed you the difficult way to do this

The easy way is to use gdb.parse_and_eval directly on a gdb expression:

            (gdb) python \
            val3 = gdb.parse_and_eval(
               '((PyDictObject*)interned)->ma_table'
            )
          

Looking at C arrays

Pointer and array gdb.Value instances support the Python indexing syntax:

            (gdb) python print val3[2]
            {me_hash = 0, me_key = 0x0, me_value = 0x0}
          

This is equivalent to the underlying C pointer/array syntax:

          ((PyDictObject*)interned)->ma_table[2]
          

Iterating through a data structure

We can then use Python to find all entries in the table satisfying a criteria:

            (gdb) python
            print [i for i in range(8192)
                     if long(val3[i]['me_value']) == 0]
          

What was the point of all the above?

Pretty-printers for custom data types

Example: LibreOffice's string types

Before:

            (gdb) print pWndContents
            (String *) 0x7f842941fcf0
          

After:

            (gdb) print pWndContents
            String(u'Hello world')
          

This will show up everywhere in GDB, including backtraces.

Example: LibreOffice's string types (pt2)

            typedef struct _UniStringData {
                sal_Int32    mnRefCount;
                sal_Int32    mnLen;
                sal_Unicode  maStr[1];
            } UniStringData;

            class String {
            private:
                UniStringData*  mpData;
            };
          

How to write a prettyprinter (1)

Get the program to some known state

Go hunting for instances of the type:

            (gdb) p pSVData->maAppData->mpAppName
            $14 = (String *) 0x7f842941fcf0

            (gdb) p $14->mpData
            $15 = (UniStringData *) 0x7f264fb6fda0

            (gdb) p *$15
            $16 = {mnRefCount = 1, mnLen = 7, maStr = {115}}
          

How to write a prettyprinter (2)

Now capture it as a python variable, to make it easy to go peeking inside it:

            (gdb) python
            appName = gdb.parse_and_eval(
            'pSVData->maAppData->mpAppName'
            )
            [CTRL-D]

            (gdb) python print appName
            0x7f264fb79028
          

Poke at it till it works

Here's the fragment of Python code I came up with for printing (String*) values:

            (gdb) python mpData = appName['mpData']
            (gdb) python
            print(
               repr(u"".join(
                 [unichr(int(mpData['maStr'][i]))
                  for i in range(mpData['mnLen'])]
                 )
               )
            )
          

Giving this output:

            u'soffice'
          

Wire up the hack into gdb (1)

A prettyprinter is a class:

            class StringPrinter(object):
                def __init__(self, val):
                    # "val" is a gdb.Value
                    # representing a (String *)
                    # in the inferior process
                    self.val = val
          

Wire up the hack into gdb (2)

with a to_string method:

            def to_string(self):
                mpData = self.val['mpData']
                length = int(mpData['mnLen'])
                maStr = mpData['maStr']
                chars = [unichr(int(maStr[i]))
                         for i in xrange(length)]
                result = u"".join(chars)
                return "String(%r)" % result
          

Wire up the hack into gdb (3)

def pp_lookup(gdbval):
    # Only for types that are "String *"
    type = gdbval.type.unqualified()
    if type.code == gdb.TYPE_CODE_PTR:
        type = type.target().unqualified()
        t = str(type)
        if t in ("String"):
            return StringPrinter(gdbval)
          

Wire up the hack into gdb (4)

def register (obj):
    if obj == None:
        obj = gdb

    # Wire up the pretty-printer
    obj.pretty_printers.append(pp_lookup)

register (gdb.current_objfile ())
          

See the documentation for more details:

http://sourceware.org/gdb/current/onlinedocs/gdb/Python.html

Tweaks

Checking for a NULL pointer:

            if 0 == long(self.val):
                return 'NULL'
          

Safety limit:

            # Don't send gdb into a long loop if it
            # encounters corrupt data:
            length = min(length, 1024)
          

The Edit/test cycle

Locate some data of the type in question:

            $ PYTHONPATH=$(pwd) gdb --args PROGRAM
            (gdb) python import YOUR_DEBUG_CODE
            (gdb) print SOME_DATA
          

You don't need to restart the program each time. Edit YOUR_DEBUG_CODE.py, repeat:

            (gdb) python reload(YOUR_DEBUG_CODE)
            (gdb) print SOME_DATA
          

Write an automated test suite

See Lib/test/test_gdb.py in CPython's source code for examples of this

Hints and tips

Custom gdb commands

Create a subclass of gdb.Command, and write its invoke method.

I've done this for CPython:

See Tools/gdb/libpython.py in CPython's source code

What have we covered?

Where to go from here

Lots of other Python/gdb functionality

More information

gdb documentation: http://sourceware.org/gdb/current/onlinedocs/gdb/Python.html

Tom Tromey's blog: http://sourceware.org/gdb/wiki/PythonGdbTutorial

The LibreOffice string pretty-printer I wrote: https://bugs.freedesktop.org/show_bug.cgi?id=34745

Python code that groks GNU libc's malloc/free implementation: https://fedorahosted.org/gdb-heap/

Other examples

Q & A