The C code of Python extension modules
For this talk:
The C code of Python extension modules
git clone git://git.fedorahosted.org/gcc-python-plugin.git
See my PyCon US 2012 talk: Static analysis of Python extension modules using GCC https://us.pycon.org/2012/schedule/presentation/78/
For every object:
As a C extension module author you must manually keep these in sync using Py_INCREF and Py_DECREF.
The two kinds of bugs:
cpychecker warns about these issues
git clone git://git.fedorahosted.org/gcc-python-plugin.git
It will issues warnings for any that are incorrect.
git clone git://git.fedorahosted.org/gcc-python-plugin.git
It checks for the following along all of those code paths:
git clone git://git.fedorahosted.org/gcc-python-plugin.git
It also does some simpler checking:
git clone git://git.fedorahosted.org/gcc-python-plugin.git
git clone git://git.fedorahosted.org/gcc-python-plugin.git
git clone git://git.fedorahosted.org/gcc-python-plugin.git
sudo yum install \ gcc-plugin-devel \ python-devel \ python-six \ python-pygments \ graphviz
git clone git://git.fedorahosted.org/gcc-python-plugin.git
make plugin
make demo
demo.c: In function ‘make_a_list_of_random_ints_badly’: demo.c:90:26: warning: Mismatching type in call to PyArg_ParseTuple with format code "i" [enabled by default] argument 3 ("&count") had type "long int *" (pointing to 64 bits) but was expecting "int *" (pointing to 32 bits) for format code "i" demo.c:102:1: warning: ob_refcnt of '*item' is 1 too high [enabled by default] demo.c:102:1: note: was expecting final ob_refcnt to be N + 1 (for some unknown N) demo.c:102:1: note: due to object being referenced by: PyListObject.ob_item[0] demo.c:102:1: note: but final ob_refcnt is N + 2 demo.c:97:14: note: PyLongObject allocated at: item = PyLong_FromLong(random()); demo.c:90:26: note: when PyArg_ParseTuple() succeeds at: if (!PyArg_ParseTuple(args, "i", &count)) { demo.c:90:8: note: taking False path at: if (!PyArg_ParseTuple(args, "i", &count)) { demo.c:94:10: note: reaching: list = PyList_New(0); demo.c:94:10: note: when PyList_New() succeeds at: list = PyList_New(0); demo.c:96:5: note: when considering range: 1 <= count.0 <= 0x7fffffff at: for (i = 0; i < count; i++) { demo.c:96:5: note: taking True path at: for (i = 0; i < count; i++) { demo.c:97:31: note: reaching: item = PyLong_FromLong(random()); demo.c:97:14: note: when PyLong_FromLong() succeeds at: item = PyLong_FromLong(random()); demo.c:97:14: note: ob_refcnt is now refs: 1 + N where N >= 0 demo.c:98:22: note: when PyList_Append() succeeds at: PyList_Append(list, item); demo.c:98:22: note: ob_refcnt is now refs: 2 + N where N >= 0 demo.c:98:22: note: '*item' is now referenced by 1 non-stack value(s): PyListObject.ob_item[0] demo.c:96:5: note: when considering count.0 == (int)1 from demo.c:90 at: for (i = 0; i < count; i++) { demo.c:96:5: note: taking False path at: for (i = 0; i < count; i++) { demo.c:101:5: note: reaching: return list; demo.c:102:1: note: returning demo.c:102:1: note: found 1 similar trace(s) to this demo.c:98:22: warning: calling PyList_Append with NULL as argument 1 (list) at demo.c:98 [enabled by default] demo.c:90:26: note: when PyArg_ParseTuple() succeeds at: if (!PyArg_ParseTuple(args, "i", &count)) { demo.c:90:8: note: taking False path at: if (!PyArg_ParseTuple(args, "i", &count)) { demo.c:94:10: note: reaching: list = PyList_New(0); demo.c:94:10: note: when PyList_New() fails at: list = PyList_New(0); demo.c:96:5: note: when considering range: 1 <= count.0 <= 0x7fffffff at: for (i = 0; i < count; i++) { demo.c:96:5: note: taking True path at: for (i = 0; i < count; i++) { demo.c:97:31: note: reaching: item = PyLong_FromLong(random()); demo.c:97:14: note: when PyLong_FromLong() succeeds at: item = PyLong_FromLong(random()); demo.c:98:22: note: PyList_Append() invokes Py_TYPE() on the pointer via the PyList_Check() macro, thus accessing (NULL)->ob_type demo.c:98:22: note: found 1 similar trace(s) to this demo.c:86:1: note: graphical error report for function 'make_a_list_of_random_ints_badly' written out to 'demo.c.make_a_list_of_random_ints_badly-refcount-errors.html' demo.c: In function ‘buggy_converter’: demo.c:76:26: warning: Mismatching type in call to PyArg_ParseTuple with format code "O&" [enabled by default] argument 4 ("&i") had type "int *" (pointing to 32 bits) but was expecting "Py_ssize_t *" (pointing to 64 bits) (from second argument of "int (*fn) (struct PyObject *, Py_ssize_t *)") for format code "O&" demo.c: In function ‘kwargs_example’: demo.c:62:37: warning: Mismatching type in call to PyArg_ParseTupleAndKeywords with format code "(ff):kwargs_example" [enabled by default] argument 5 ("&x") had type "double *" (pointing to 64 bits) but was expecting "float *" (pointing to 32 bits) for format code "f" demo.c:62:37: warning: Mismatching type in call to PyArg_ParseTupleAndKeywords with format code "(ff):kwargs_example" [enabled by default] argument 6 ("&y") had type "double *" (pointing to 64 bits) but was expecting "float *" (pointing to 32 bits) for format code "f" demo.c: In function ‘too_many_varargs’: demo.c:50:26: warning: Too many arguments in call to PyArg_ParseTuple with format string "i" expected 1 extra arguments: "int *" (pointing to 32 bits) but got 2: "int *" (pointing to 32 bits) "int *" (pointing to 32 bits) [enabled by default] demo.c: In function ‘not_enough_varargs’: demo.c:40:25: warning: Not enough arguments in call to PyArg_ParseTuple with format string "i" expected 1 extra arguments: "int *" (pointing to 32 bits) but got none [enabled by default] demo.c: In function ‘socket_htons’: demo.c:30:26: warning: Mismatching type in call to PyArg_ParseTuple with format code "i:htons" [enabled by default] argument 3 ("&x1") had type "long unsigned int *" (pointing to 64 bits) but was expecting "int *" (pointing to 32 bits) for format code "i"
CC=/path/to/built/plugin/gcc-with-cpychecker \ python setup.py build
to set the environment variable
make CC=/path/to/built/plugin/gcc-with-cpychecker
to override the Makefile variable CC.
Scaling up to hundreds of projects:
mock-with-analysis: https://github.com/fedora-static-analysis/mock-with-analysis
[(u'PyErr_SetString', 1030), (u'PyArg_ParseTuple', 763), (u'PyErr_Format', 353), (u'Py_BuildValue', 338), (u'_PyArg_ParseTuple_SizeT', 333), (u'PyInt_FromLong', 326), (u'PyErr_Occurred', 313), (u'PyEval_RestoreThread', 297), (u'PyEval_SaveThread', 297), (u'PyString_FromString', 246), (u'PyArg_ParseTupleAndKeywords', 213), (u'PyErr_NoMemory', 194), (u'PyObject_GetAttrString', 174), (u'PyTuple_New', 170), (u'PyType_IsSubtype', 170), (u'PyList_New', 169), (u'_Py_NegativeRefcount', 168), (u'_Py_Dealloc', 168), (u'Py_InitModule4_64', 132), (u'PyErr_Clear', 122)]
FIXME: this is just a snapshot of the 2013-02-25 data (113 packages)
[(u'_PyLong_FromByteArray', 1), (u'_PylibMC_Deflate', 1), (u'PyObject_DelItem', 1), (u'PyNumber_InPlacePower', 1), (u'PyShm_semaphore', 1), (u'PySys_GetObject', 1), (u'PyGreenlet_GetCurrent', 1), (u'_PylibMC_Inflate', 1), (u'Py_SetProgramName', 1), (u'PyArray_Free', 1), (u'Py_Initialize', 1), (u'PyFile_WriteObject', 1), (u'_PyLong_Frexp', 1), (u'PyGreenlet_SetParent', 1), (u'PyNumber_AsSsize_t', 1), (u'PyCapsule_New', 1), (u'PyInt_FromSize_t', 1), (u'PyNumber_Add', 1), (u'PyThreadState_GetDict', 1), (u'PyArray_CanCastSafely', 1), (u'PyNumber_Power', 1)]
FIXME: this is just a snapshot of the 2013-02-25 data (113 packages)
i.e. query by cyclomatic complexity
python-4Suite-XML-1.0.2:Ft/Xml/XPath/XPathParser.c:parser_parse: 1601 python-4Suite-XML-1.0.2:Ft/Xml/XPointer/XPointerParser.c:parser_parse: 455 python-zmq-2.1.9:zmq/core/constants.c:PyInit_constants: 428 python-cvxopt-1.1.4:C/dsdp.c:solvesdp: 256 python-cvxopt-1.1.4:C/glpk.c:integer: 215 python-ldap-2.4.6:Modules/message.c:LDAPmessage_to_python: 183 python-cvxopt-1.1.4:C/glpk.c:simplex: 180 python-sybase-0.39:ctx.c:CS_CONTEXT_cs_config: 177 python-sqlite2-2.3.5:src/cursor.c:_pysqlite_query_execute: 170 python-ldap-2.4.6:Modules/constants.c:LDAPinit_constants: 167
To be fair, many of the above are autogenerated
Based on a rebuild of all Fedora 17 source RPMs named "python-*" (113 packages, including python itself):
TODO: redo this for all of the Python packages (370)
[(('cpychecker', 'refcount-too-high'), 851), (('cpychecker', 'mismatching-type-in-format-string'), 783), (('cpychecker', 'returns-NULL-without-setting-exception'), 610), (('clang-analyzer', None), 540), (('cppcheck', 'nullPointer'), 427), (('cppcheck', 'syntaxError'), 382), (('cpychecker', 'null-ptr-dereference'), 295), (('gcc', 'strict-aliasing'), 221), (('cpychecker', 'null-ptr-argument'), 194), (('cpychecker', 'refcount-too-low'), 193), (('cpychecker', 'flags-within-PyMethodDef'), 160), (('gcc', 'unused-but-set-variable'), 153), (('gcc', None), 85), (('cpychecker', 'usage-of-uninitialized-data'), 62), (('gcc', 'unused-variable'), 60), (('gcc', 'parentheses'), 55), (('gcc', 'deprecated-declarations'), 42), (('gcc', 'pointer-sign'), 40), (('gcc', 'maybe-uninitialized'), 37), (('cppcheck', 'uninitvar'), 26), (('gcc', 'format'), 26), (('cppcheck', 'memleakOnRealloc'), 23), (('cppcheck', 'memleak'), 19), (('gcc', 'unused-result'), 16), (('gcc', 'implicit-function-declaration'), 15), (('gcc', 'unused-label'), 12), (('gcc', 'int-to-pointer-cast'), 11), (('cppcheck', 'resourceLeak'), 8), (('gcc', 'switch'), 6), (('cpychecker', 'missing-keyword-argument'), 6), (('gcc', 'return-type'), 5), (('gcc', 'address'), 4), (('cpychecker', 'arg-was-not-PyObject-ptr'), 4), (('cppcheck', 'uninitstring'), 4), (('gcc', 'uninitialized'), 4), (('cpychecker', 'not-enough-vars-in-format-string'), 4), (('cppcheck', 'useClosedFile'), 4), (('cpychecker', 'read-from-deallocated-memory'), 3), (('gcc', 'pointer-to-int-cast'), 3), (('cppcheck', 'invalidScanfFormatWidth'), 2), (('cpychecker', 'too-many-vars-in-format-string'), 2), (('gcc', 'sequence-point'), 2), (('gcc', 'nonnull'), 2), (('cpychecker', 'returns-pointer-to-deallocated-memory'), 1), (('cppcheck', 'deallocDealloc'), 1), (('cppcheck', 'IOWithoutPositioning'), 1), (('cpychecker', 'call-of-null-function-ptr'), 1), (('gcc', 'enum-compare'), 1), (('gcc', 'unused-value'), 1), (('cppcheck', 'doubleFree'), 1)]
[('refcount-too-high', 851), ('mismatching-type-in-format-string', 783), ('returns-NULL-without-setting-exception', 610), ('null-ptr-dereference', 295), ('null-ptr-argument', 194), ('refcount-too-low', 193), ('flags-within-PyMethodDef', 160), ('usage-of-uninitialized-data', 62), ('missing-keyword-argument', 6), ('arg-was-not-PyObject-ptr', 4), ('not-enough-vars-in-format-string', 4), ('read-from-deallocated-memory', 3), ('too-many-vars-in-format-string', 2), ('returns-pointer-to-deallocated-memory', 1), ('call-of-null-function-ptr', 1)]
[('refcount-too-high', 851), ('refcount-too-low', 193)]
Leaving an ob_refcnt too high is thus the most common true bug the tool is reporting
TODO: where are the leaks happening?
(Saw this specific instance of refcount-too-low 14 times on the 113 package run)
PyObject* some_method(PyObject *self, PyObject *args) { [...snip...] /* BUG: loses a reference to Py_None */ return Py_None; }
$ python script.py Fatal error: deallocating None
PyObject* some_method(PyObject *self, PyObject *args) { [...snip...] /* Fixed version of the above: */ Py_RETURN_NONE; }
/* BUG: reference leak: */ return Py_BuildValue("Oi", some_object_we_own_a_ref_on, 42);
/* Fixed version of the above: */ return Py_BuildValue("Ni", some_object_we_own_a_ref_on, 42);
/* If it's just one object, why use Py_BuildValue? */ return some_object_we_own_a_ref_on;
a.k.a. the 783 mismatching-type-in-format-string errors
By function:
[(u'PyArg_ParseTuple', 487), (u'PyArg_ParseTupleAndKeywords', 170), (u'Py_BuildValue', 135), (u'PyArg_Parse', 2)]
(in a way this is merely expressing what I’m checking for)
So most of these are false alarms
[('null-ptr-dereference', 295), ('null-ptr-argument', 194)]
Do more error-checking, please!
{ PyObject *local0 = NULL; PyObject *local1 = NULL; PyObject *local2 = NULL; /* etc */ local0 = PyFoo_DoBar(); if (!local0) goto error; /* etc */ return result; error: Py_XDECREF(local2); Py_XDECREF(local1); Py_XDECREF(local0); return NULL; }
/* This is bogus code: Py_XDECREF expands its argument multiple times, so the function is actually called up to 4 times (assuming a non pydebug build of CPython). */ Py_XDECREF(PyObject_CallObject(callable, args)); /* Seen in the wild in: python-alsa-1.0.25-1.fc17: pyalsa/alsamixer.c:179 pyalsa/alsahcontrol.c:190 pyalsa/alsaseq.c:3277 python-4Suite-XML-1.0.2-14.fc17: Ft/Xml/src/domlette/refcounts.c:80 */
This expands to:
do { if ((PyObject_CallObject(callable, args)) == ((void *)0)) ; else do { if (--(PyObject_CallObject(callable, args)->ob_refcnt) != 0) ; else (*(PyObject_CallObject(callable, args)->ob_type)->tp_dealloc) PyObject_CallObject(callable, args); } while (0); } while (0);
Filed as http://bugs.python.org/issue17206
…which is effectively:
/* Call it once */ if ((PyObject_CallObject(callable, args)) != NULL) { /* If it doesn't raise an exception, leak the reference (BUG 1), and call it again (BUG 2). Assume that the second call doesn't raise an exception, otherwise segfault the interpreter (BUG 3), and DECREF the result, but don't deallocate if the refcount is zero (BUGS 4 and 5) */ if (--(PyObject_CallObject(callable, args)->ob_refcnt) == 0) { /* If the refcount is zero, call it again! (BUG 6) Assume the result is non-NULL (otherwise segfaulting, BUG 7) and deallocate whatever you got back (even if the refcount is non-zero, BUG 8) */ (*(PyObject_CallObject(callable, args)->ob_type)->tp_dealloc) /* and for good measure, call it agains (BUG 9) and leak a reference to the result (BUG 10) */ PyObject_CallObject(callable, args); } }
Filed as http://bugs.python.org/issue17206
PyObject *result; result = PyObject_CallObject(callable, args); Py_XDECREF(result); /* Presumably the caller will do something about any exception: */ return (result != NULL) ? 0 : -1;
Thanks for listening!
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License
Source code for talk: https://github.com/davidmalcolm/PyCon-US-2013-Talk