What is static analysis?

Discovering properties of a program without running it
Programs that analyze other programs
Treating programs as data, rather than code
In particular, automatically finding bugs in code

What kind of code will be analyzed?

For this talk:

The C code of Python extension modules

Prerequisites

I’m going to assume basic familiarity with Python, and with either C or C++
Hopefully you’ve used, debugged, or written a Python extension module in C (perhaps via SWIG or Cython)

Outline

Intro to "cpychecker"
How to run the tool on your own code
How I ran the tool on lots of code
What bugs came up frequently
Recommendations on dealing with C and C++ from Python
Q & A

cpychecker

Part of my Python plugin for GCC
6500 lines of Python code implementing a static checker for C extension modules
It finds bugs in the C code of Python extension modules
https://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html

git clone git://git.fedorahosted.org/gcc-python-plugin.git

See my PyCon US 2012 talk: Static analysis of Python extension modules using GCC https://us.pycon.org/2012/schedule/presentation/78/

What it checks for (1): Reference counting

For every object:

"what is my reference count?" aka "ob_refcnt" (the object’s view of how many pointers point to it)
the reality of how many pointers point to it

As a C extension module author you must manually keep these in sync using Py_INCREF and Py_DECREF.

The two kinds of bugs:

ob_refcnt too high: (memory leaks - hence the title of this talk)
ob_refcnt too low: (crashes)

cpychecker warns about these issues

git clone git://git.fedorahosted.org/gcc-python-plugin.git

Checking reference counts

For each path through the function and PyObject*, it determines:
- what the reference count ought to be at the end of the function (based on how many pointers point to the object)
- what the reference count is

It will issues warnings for any that are incorrect.

Limitations of the refcount checking

purely intraprocedural
assumes every function returning a PyObject* returns a new reference, rather than a borrowed reference
- it knows about most of the CPython API and its rules
- you can manually mark functions with non-standard behavior
only tracks 0 and 1 times through any loop, to ensure that the analysis doesn’t go on forever
can be defeated by relatively simple code (turn up --maxtrans argument)

git clone git://git.fedorahosted.org/gcc-python-plugin.git

What it checks for (2)

It checks for the following along all of those code paths:

Dereferencing a NULL pointer (e.g. using result of an allocator without checking the result is non-NULL)
Passing NULL to CPython APIs that will crash on NULL
Usage of uninitialized local variables
Dereferencing a pointer to freed memory
Returning a pointer to freed memory
Returning NULL without setting an exception

git clone git://git.fedorahosted.org/gcc-python-plugin.git

What it checks for (3)

It also does some simpler checking:

type in calls to PyArg_ParseTuple et al
types and NULL termination of PyMethodDef tables
types and NULL termination of PyObject_Call{Function|Method}ObjArgs

git clone git://git.fedorahosted.org/gcc-python-plugin.git

What it doesn’t check for (patches welcome!)

tp_traverse errors (which can mess up the garbage collector); missing it altogether, or omitting fields
errors in GIL handling
lock/release mismatches
missed opportunities to release the GIL (e.g. compute-intensive functions; functions that wait on IO/syscalls)

git clone git://git.fedorahosted.org/gcc-python-plugin.git

What it can’t check for

Does the code "do the right thing"?

git clone git://git.fedorahosted.org/gcc-python-plugin.git

Building cpychecker

Dependencies (on Fedora)

sudo yum install \
   gcc-plugin-devel \
   python-devel \
   python-six \
   python-pygments \
   graphviz

Downloading the code

git clone git://git.fedorahosted.org/gcc-python-plugin.git

Building the checker

make plugin

Checking that it’s working

make demo

Textual output from "make demo"

demo.c: In function ‘make_a_list_of_random_ints_badly’:
demo.c:90:26: warning: Mismatching type in call to PyArg_ParseTuple with format code "i" [enabled by default]
  argument 3 ("&count") had type
    "long int *" (pointing to 64 bits)
  but was expecting
    "int *" (pointing to 32 bits)
  for format code "i"
demo.c:102:1: warning: ob_refcnt of '*item' is 1 too high [enabled by default]
demo.c:102:1: note: was expecting final ob_refcnt to be N + 1 (for some unknown N)
demo.c:102:1: note: due to object being referenced by: PyListObject.ob_item[0]
demo.c:102:1: note: but final ob_refcnt is N + 2
demo.c:97:14: note: PyLongObject allocated at:         item = PyLong_FromLong(random());
demo.c:90:26: note: when PyArg_ParseTuple() succeeds at:     if (!PyArg_ParseTuple(args, "i", &count)) {
demo.c:90:8: note: taking False path at:     if (!PyArg_ParseTuple(args, "i", &count)) {
demo.c:94:10: note: reaching:     list = PyList_New(0);
demo.c:94:10: note: when PyList_New() succeeds at:     list = PyList_New(0);
demo.c:96:5: note: when considering range: 1 <= count.0 <= 0x7fffffff at:     for (i = 0; i < count; i++) {
demo.c:96:5: note: taking True path at:     for (i = 0; i < count; i++) {
demo.c:97:31: note: reaching:         item = PyLong_FromLong(random());
demo.c:97:14: note: when PyLong_FromLong() succeeds at:         item = PyLong_FromLong(random());
demo.c:97:14: note: ob_refcnt is now refs: 1 + N where N >= 0
demo.c:98:22: note: when PyList_Append() succeeds at:         PyList_Append(list, item);
demo.c:98:22: note: ob_refcnt is now refs: 2 + N where N >= 0
demo.c:98:22: note: '*item' is now referenced by 1 non-stack value(s): PyListObject.ob_item[0]
demo.c:96:5: note: when considering count.0 == (int)1 from demo.c:90 at:     for (i = 0; i < count; i++) {
demo.c:96:5: note: taking False path at:     for (i = 0; i < count; i++) {
demo.c:101:5: note: reaching:     return list;
demo.c:102:1: note: returning
demo.c:102:1: note: found 1 similar trace(s) to this
demo.c:98:22: warning: calling PyList_Append with NULL as argument 1 (list) at demo.c:98 [enabled by default]
demo.c:90:26: note: when PyArg_ParseTuple() succeeds at:     if (!PyArg_ParseTuple(args, "i", &count)) {
demo.c:90:8: note: taking False path at:     if (!PyArg_ParseTuple(args, "i", &count)) {
demo.c:94:10: note: reaching:     list = PyList_New(0);
demo.c:94:10: note: when PyList_New() fails at:     list = PyList_New(0);
demo.c:96:5: note: when considering range: 1 <= count.0 <= 0x7fffffff at:     for (i = 0; i < count; i++) {
demo.c:96:5: note: taking True path at:     for (i = 0; i < count; i++) {
demo.c:97:31: note: reaching:         item = PyLong_FromLong(random());
demo.c:97:14: note: when PyLong_FromLong() succeeds at:         item = PyLong_FromLong(random());
demo.c:98:22: note: PyList_Append() invokes Py_TYPE() on the pointer via the PyList_Check() macro, thus accessing (NULL)->ob_type
demo.c:98:22: note: found 1 similar trace(s) to this
demo.c:86:1: note: graphical error report for function 'make_a_list_of_random_ints_badly' written out to 'demo.c.make_a_list_of_random_ints_badly-refcount-errors.html'
demo.c: In function ‘buggy_converter’:
demo.c:76:26: warning: Mismatching type in call to PyArg_ParseTuple with format code "O&" [enabled by default]
  argument 4 ("&i") had type
    "int *" (pointing to 32 bits)
  but was expecting
    "Py_ssize_t *" (pointing to 64 bits) (from second argument of "int (*fn) (struct PyObject *, Py_ssize_t *)")
  for format code "O&"
demo.c: In function ‘kwargs_example’:
demo.c:62:37: warning: Mismatching type in call to PyArg_ParseTupleAndKeywords with format code "(ff):kwargs_example" [enabled by default]
  argument 5 ("&x") had type
    "double *" (pointing to 64 bits)
  but was expecting
    "float *" (pointing to 32 bits)
  for format code "f"
demo.c:62:37: warning: Mismatching type in call to PyArg_ParseTupleAndKeywords with format code "(ff):kwargs_example" [enabled by default]
  argument 6 ("&y") had type
    "double *" (pointing to 64 bits)
  but was expecting
    "float *" (pointing to 32 bits)
  for format code "f"
demo.c: In function ‘too_many_varargs’:
demo.c:50:26: warning: Too many arguments in call to PyArg_ParseTuple with format string "i"
  expected 1 extra arguments:
    "int *" (pointing to 32 bits)
  but got 2:
    "int *" (pointing to 32 bits)
    "int *" (pointing to 32 bits)
 [enabled by default]
demo.c: In function ‘not_enough_varargs’:
demo.c:40:25: warning: Not enough arguments in call to PyArg_ParseTuple with format string "i"
  expected 1 extra arguments:
    "int *" (pointing to 32 bits)
  but got none
 [enabled by default]
demo.c: In function ‘socket_htons’:
demo.c:30:26: warning: Mismatching type in call to PyArg_ParseTuple with format code "i:htons" [enabled by default]
  argument 3 ("&x1") had type
    "long unsigned int *" (pointing to 64 bits)
  but was expecting
    "int *" (pointing to 32 bits)
  for format code "i"

Graphical output from "cpychecker"

images/sample-html-error-report.png

Running it on your own code

Distutils

CC=/path/to/built/plugin/gcc-with-cpychecker \
   python setup.py build

to set the environment variable

Makefiles

make CC=/path/to/built/plugin/gcc-with-cpychecker

to override the Makefile variable CC.

Running cpychecker a lot

Scaling up to hundreds of projects:

building via RPM
- hides the distutils vs Makefile vs CMake etc
"mock" builds
- every build gets its own freshly-provisioned chroot
"mock-with-analysis": running checkers:
- cpychecker
- cppcheck
- clang-analyzer
- gcc warnings

mock-with-analysis: https://github.com/fedora-static-analysis/mock-with-analysis

Scaling up (continued)

separation of model from presentation
- "Firehose" XML format: https://github.com/fedora-static-analysis/firehose
detect analyzers that fail or exceed 1 minute to run
store the result in a database
capture any sources mentioned in a report
can also capture arbitrary data e.g. code metrics

Graphical report

images/viewing-an-issue.png

Comparing builds

images/comparison-view.png

Managing failures

images/viewing-a-failure.png

Analyze all the things!

For PyCon:
- 370(ish) packages: all of the Python C code in Fedora
For this rehearsal:
- a subset: 113 packages: everything named "python-*"

What are the 20 most commonly used Py/_Py entrypoints?

[(u'PyErr_SetString', 1030),
 (u'PyArg_ParseTuple', 763),
 (u'PyErr_Format', 353),
 (u'Py_BuildValue', 338),
 (u'_PyArg_ParseTuple_SizeT', 333),
 (u'PyInt_FromLong', 326),
 (u'PyErr_Occurred', 313),
 (u'PyEval_RestoreThread', 297),
 (u'PyEval_SaveThread', 297),
 (u'PyString_FromString', 246),
 (u'PyArg_ParseTupleAndKeywords', 213),
 (u'PyErr_NoMemory', 194),
 (u'PyObject_GetAttrString', 174),
 (u'PyTuple_New', 170),
 (u'PyType_IsSubtype', 170),
 (u'PyList_New', 169),
 (u'_Py_NegativeRefcount', 168),
 (u'_Py_Dealloc', 168),
 (u'Py_InitModule4_64', 132),
 (u'PyErr_Clear', 122)]

FIXME: this is just a snapshot of the 2013-02-25 data (113 packages)

What are the 20 least commonly used Py/_Py entrypoints?

[(u'_PyLong_FromByteArray', 1),
 (u'_PylibMC_Deflate', 1),
 (u'PyObject_DelItem', 1),
 (u'PyNumber_InPlacePower', 1),
 (u'PyShm_semaphore', 1),
 (u'PySys_GetObject', 1),
 (u'PyGreenlet_GetCurrent', 1),
 (u'_PylibMC_Inflate', 1),
 (u'Py_SetProgramName', 1),
 (u'PyArray_Free', 1),
 (u'Py_Initialize', 1),
 (u'PyFile_WriteObject', 1),
 (u'_PyLong_Frexp', 1),
 (u'PyGreenlet_SetParent', 1),
 (u'PyNumber_AsSsize_t', 1),
 (u'PyCapsule_New', 1),
 (u'PyInt_FromSize_t', 1),
 (u'PyNumber_Add', 1),
 (u'PyThreadState_GetDict', 1),
 (u'PyArray_CanCastSafely', 1),
 (u'PyNumber_Power', 1)]

FIXME: this is just a snapshot of the 2013-02-25 data (113 packages)

What are the 10 most complicated functions?

i.e. query by cyclomatic complexity

python-4Suite-XML-1.0.2:Ft/Xml/XPath/XPathParser.c:parser_parse: 1601
python-4Suite-XML-1.0.2:Ft/Xml/XPointer/XPointerParser.c:parser_parse: 455
python-zmq-2.1.9:zmq/core/constants.c:PyInit_constants: 428
python-cvxopt-1.1.4:C/dsdp.c:solvesdp: 256
python-cvxopt-1.1.4:C/glpk.c:integer: 215
python-ldap-2.4.6:Modules/message.c:LDAPmessage_to_python: 183
python-cvxopt-1.1.4:C/glpk.c:simplex: 180
python-sybase-0.39:ctx.c:CS_CONTEXT_cs_config: 177
python-sqlite2-2.3.5:src/cursor.c:_pysqlite_query_execute: 170
python-ldap-2.4.6:Modules/constants.c:LDAPinit_constants: 167

To be fair, many of the above are autogenerated

What did the analyzers complain about?

Based on a rebuild of all Fedora 17 source RPMs named "python-*" (113 packages, including python itself):

TODO: redo this for all of the Python packages (370)

[(('cpychecker', 'refcount-too-high'), 851),
 (('cpychecker', 'mismatching-type-in-format-string'), 783),
 (('cpychecker', 'returns-NULL-without-setting-exception'), 610),
 (('clang-analyzer', None), 540),
 (('cppcheck', 'nullPointer'), 427),
 (('cppcheck', 'syntaxError'), 382),
 (('cpychecker', 'null-ptr-dereference'), 295),
 (('gcc', 'strict-aliasing'), 221),
 (('cpychecker', 'null-ptr-argument'), 194),
 (('cpychecker', 'refcount-too-low'), 193),
 (('cpychecker', 'flags-within-PyMethodDef'), 160),
 (('gcc', 'unused-but-set-variable'), 153),
 (('gcc', None), 85),
 (('cpychecker', 'usage-of-uninitialized-data'), 62),
 (('gcc', 'unused-variable'), 60),
 (('gcc', 'parentheses'), 55),
 (('gcc', 'deprecated-declarations'), 42),
 (('gcc', 'pointer-sign'), 40),
 (('gcc', 'maybe-uninitialized'), 37),
 (('cppcheck', 'uninitvar'), 26),
 (('gcc', 'format'), 26),
 (('cppcheck', 'memleakOnRealloc'), 23),
 (('cppcheck', 'memleak'), 19),
 (('gcc', 'unused-result'), 16),
 (('gcc', 'implicit-function-declaration'), 15),
 (('gcc', 'unused-label'), 12),
 (('gcc', 'int-to-pointer-cast'), 11),
 (('cppcheck', 'resourceLeak'), 8),
 (('gcc', 'switch'), 6),
 (('cpychecker', 'missing-keyword-argument'), 6),
 (('gcc', 'return-type'), 5),
 (('gcc', 'address'), 4),
 (('cpychecker', 'arg-was-not-PyObject-ptr'), 4),
 (('cppcheck', 'uninitstring'), 4),
 (('gcc', 'uninitialized'), 4),
 (('cpychecker', 'not-enough-vars-in-format-string'), 4),
 (('cppcheck', 'useClosedFile'), 4),
 (('cpychecker', 'read-from-deallocated-memory'), 3),
 (('gcc', 'pointer-to-int-cast'), 3),
 (('cppcheck', 'invalidScanfFormatWidth'), 2),
 (('cpychecker', 'too-many-vars-in-format-string'), 2),
 (('gcc', 'sequence-point'), 2),
 (('gcc', 'nonnull'), 2),
 (('cpychecker', 'returns-pointer-to-deallocated-memory'), 1),
 (('cppcheck', 'deallocDealloc'), 1),
 (('cppcheck', 'IOWithoutPositioning'), 1),
 (('cpychecker', 'call-of-null-function-ptr'), 1),
 (('gcc', 'enum-compare'), 1),
 (('gcc', 'unused-value'), 1),
 (('cppcheck', 'doubleFree'), 1)]

What did cpychecker complain about?

[('refcount-too-high', 851),
 ('mismatching-type-in-format-string', 783),
 ('returns-NULL-without-setting-exception', 610),
 ('null-ptr-dereference', 295),
 ('null-ptr-argument', 194),
 ('refcount-too-low', 193),
 ('flags-within-PyMethodDef', 160),
 ('usage-of-uninitialized-data', 62),
 ('missing-keyword-argument', 6),
 ('arg-was-not-PyObject-ptr', 4),
 ('not-enough-vars-in-format-string', 4),
 ('read-from-deallocated-memory', 3),
 ('too-many-vars-in-format-string', 2),
 ('returns-pointer-to-deallocated-memory', 1),
 ('call-of-null-function-ptr', 1)]

Reference-counting bugs

[('refcount-too-high', 851),
 ('refcount-too-low', 193)]

Leaving an ob_refcnt too high is thus the most common true bug the tool is reporting

TODO: where are the leaks happening?

Missing Py_INCREF() on Py_None

(Saw this specific instance of refcount-too-low 14 times on the 113 package run)

PyObject*
some_method(PyObject *self, PyObject *args)
{
    [...snip...]

    /* BUG: loses a reference to Py_None */
    return Py_None;
}

$ python script.py
Fatal error: deallocating None

Fixing Py_INCREF on Py_None

PyObject*
some_method(PyObject *self, PyObject *args)
{
    [...snip...]

    /* Fixed version of the above: */
    Py_RETURN_NONE;
}

Reference leak in Py_BuildValue with "O"

    /* BUG: reference leak: */
    return Py_BuildValue("Oi", some_object_we_own_a_ref_on, 42);

    /* Fixed version of the above: */
    return Py_BuildValue("Ni", some_object_we_own_a_ref_on, 42);

    /* If it's just one object, why use Py_BuildValue? */
    return some_object_we_own_a_ref_on;

What were those format string errors?

a.k.a. the 783 mismatching-type-in-format-string errors

By function:

[(u'PyArg_ParseTuple', 487),
 (u'PyArg_ParseTupleAndKeywords', 170),
 (u'Py_BuildValue', 135),
 (u'PyArg_Parse', 2)]

(in a way this is merely expressing what I’m checking for)

650 errors using PyArg_ParseTuple[AndKeywords]

9 errors where "i" was used to unpack a 32-bit int value, but was passed a 64-bit destination (partial initialization, leading to possible data corruption)
282 "errors" where "s", "s#", "z", "z#" was used to unpack a string object to a "char \*", rather than a "const char *"
numerous false alarms for the "O" and "O!" code due to the checker not knowing about the relevant PyTypeObject

So most of these are false alarms

489 places lacking error-checking

[('null-ptr-dereference', 295),
 ('null-ptr-argument', 194)]

Do more error-checking, please!

"goto" considered wonderful

{
    PyObject *local0 = NULL;
    PyObject *local1 = NULL;
    PyObject *local2 = NULL;
    /* etc */

    local0 = PyFoo_DoBar();
    if (!local0) goto error;

    /* etc */

    return result;

error:
    Py_XDECREF(local2);
    Py_XDECREF(local1);
    Py_XDECREF(local0);
    return NULL;
}

DO NOT DO THIS: Py_XDECREF(PyObject_CallObject()) (1)

    /*
      This is bogus code: Py_XDECREF expands its argument multiple times,
      so the function is actually called up to 4 times
      (assuming a non pydebug build of CPython).
    */
    Py_XDECREF(PyObject_CallObject(callable, args));

    /* Seen in the wild in:
       python-alsa-1.0.25-1.fc17:
         pyalsa/alsamixer.c:179
         pyalsa/alsahcontrol.c:190
         pyalsa/alsaseq.c:3277

       python-4Suite-XML-1.0.2-14.fc17:
         Ft/Xml/src/domlette/refcounts.c:80
    */

DO NOT DO THIS: Py_XDECREF(PyObject_CallObject()) (2)

This expands to:

  do {
    if ((PyObject_CallObject(callable, args)) == ((void *)0))
      ;
    else
      do {
        if (--(PyObject_CallObject(callable, args)->ob_refcnt) != 0)
          ;
        else
          (*(PyObject_CallObject(callable, args)->ob_type)->tp_dealloc)
            PyObject_CallObject(callable, args);
      } while (0);
  } while (0);

Filed as http://bugs.python.org/issue17206

DO NOT DO THIS: Py_XDECREF(PyObject_CallObject()) (3)

…which is effectively:

    /* Call it once */
    if ((PyObject_CallObject(callable, args)) != NULL) {
        /*
           If it doesn't raise an exception, leak the reference (BUG 1),
           and call it again (BUG 2).

           Assume that the second call doesn't raise an exception,
           otherwise segfault the interpreter (BUG 3),
           and DECREF the result, but don't deallocate if the refcount
           is zero (BUGS 4 and 5)
         */
        if (--(PyObject_CallObject(callable, args)->ob_refcnt) == 0) {
          /*
            If the refcount is zero, call it again! (BUG 6)
            Assume the result is non-NULL (otherwise segfaulting, BUG 7)
            and deallocate whatever you got back (even if the refcount
            is non-zero, BUG 8)
           */
          (*(PyObject_CallObject(callable, args)->ob_type)->tp_dealloc)
            /* and for good measure, call it agains (BUG 9)
               and leak a reference to the result (BUG 10) */
            PyObject_CallObject(callable, args);
        }
    }