Tutorial part 7: execution paths

A diagnostic can optionally have a diagnostic_execution_path describing a path of execution through code.

For example, let’s pretend we’re writing a static analyis tool for finding bugs in CPython extension code.

Let’s say we’re analyzing this code:

PyObject *
make_a_list_of_random_ints_badly(PyObject *self,
				 PyObject *args)
{
  PyObject *list, *item;
  long count, i;

  if (!PyArg_ParseTuple(args, "i", &count)) {
    return NULL;
  }

  list = PyList_New(0);
	
  for (i = 0; i < count; i++) {
    item = PyLong_FromLong(random());
    PyList_Append(list, item);
  }
  
  return list;
}

This code attempts to take an Python integer parameter and then build a list of that length, containing random integers. However, there are numerous bugs in this code: a type mismatch, mistakes in reference-counting, and an almost total lack of error-handling.

For example, PyList_Append requires a non-NULL first parameter (list), but PyList_New can fail, returning NULL, and this isn’t checked for, which would lead to a segfault if PyList_New fails.

We can add a diagnostic_execution_path to the diagnostic via diagnostic_add_execution_path(), and then add events to it using diagnostic_execution_path_add_event().

For example, with:

diagnostic_event_id alloc_event_id
  = diagnostic_execution_path_add_event (path,
                                         loc_call_to_PyList_New,
                                         logical_loc, 0,
                                         "when %qs fails, returning NULL",
                                         "PyList_New");

we create an event that will be worded as:

(1) when `PyList_New' fails, returning NULL

Note that diagnostic_execution_path_add_event() returns a diagnostic_event_id. We can use this to refer to this event in another event using the %@ format code in its message, which takes the address of a diagnostic_event_id:

diagnostic_execution_path_add_event (path,
                                     loc_call_to_PyList_Append,
                                     logical_loc, 0,
                                     "when calling %qs, passing NULL from %@ as argument %i",
                                     "PyList_Append", &alloc_event_id, 1);

where the latter event will be worded as:

(2) when calling `PyList_Append', passing NULL from (1) as argument 1

where the %@ reference to the other event has been printed as (1). In SARIF output the text “(1)” will have a embedded link referring within the sarif log to the threadFlowLocation object for the other event, via JSON pointer (see §3.10.3 “URIs that use the sarif scheme”).

Let’s add an event between these describing control flow, creating three events in all:

  diagnostic_execution_path *path = diagnostic_add_execution_path (d);
  
  diagnostic_event_id alloc_event_id
    = diagnostic_execution_path_add_event (path,
					   loc_call_to_PyList_New,
					   logical_loc, 0,
					   "when %qs fails, returning NULL",
					   "PyList_New");
  diagnostic_execution_path_add_event (path,
				       loc_for_cond,
				       logical_loc, 0,
				       "when %qs", "i < count");
  diagnostic_execution_path_add_event (path,
				       loc_call_to_PyList_Append,
				       logical_loc, 0,
				       "when calling %qs, passing NULL from %@ as argument %i",
				       "PyList_Append", &alloc_event_id, 1);

Assuming we also gave it diagnostic_logical_location with:

  const char *funcname = "make_a_list_of_random_ints_badly";
  const diagnostic_logical_location *logical_loc
    = diagnostic_manager_new_logical_location (diag_mgr,
					       DIAGNOSTIC_LOGICAL_LOCATION_KIND_FUNCTION,
					       NULL, /* parent */
					       funcname,
					       funcname,
					       funcname);

and finish the diagnostic with diagnostic_finish() like this:

diagnostic_finish (d,
                   "passing NULL as argument %i to %qs"
                   " which requires a non-NULL parameter",
                   1, "PyList_Append");

then we should get output to text sinks similar to the following:

In function 'make_a_list_of_random_ints_badly':
test-warning-with-path.c:30:5: warning: passing NULL as argument 1 to 'PyList_Append' which requires a non-NULL parameter"
   30 |     PyList_Append(list, item);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~
make_a_list_of_random_ints_badly': events 1-3
   26 |   list = PyList_New(0);
      |          ^~~~~~~~~~~~~
      |          |
      |          (1) when 'PyList_New' fails, returning NULL
   27 |
   28 |   for (i = 0; i < count; i++) {
      |               ~~~~~~~~~
      |               |
      |               (2) when 'i < count'
   29 |     item = PyLong_FromLong(random());
   30 |     PyList_Append(list, item);
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~
      |     |
      |     (3) when calling 'PyList_Append', passing NULL from (1) as argument 1

and for SARIF sinks the path will be added as a codeFlow object (see SARIF 2.1.0 3.36 codeFlow object).

Here’s the above example in full:

  /* begin create phys locs */
  const diagnostic_physical_location *loc_call_to_PyList_New
    = make_range (diag_mgr, main_file, line_num_call_to_PyList_New, 10, 22);
  const diagnostic_physical_location *loc_for_cond
    = make_range (diag_mgr, main_file, line_num_for_loop, 15, 23);
  const diagnostic_physical_location *loc_call_to_PyList_Append
    = make_range (diag_mgr, main_file, line_num_call_to_PyList_Append, 5, 29);
  /* end create phys locs */

  /* begin create logical locs */
  const char *funcname = "make_a_list_of_random_ints_badly";
  const diagnostic_logical_location *logical_loc
    = diagnostic_manager_new_logical_location (diag_mgr,
					       DIAGNOSTIC_LOGICAL_LOCATION_KIND_FUNCTION,
					       NULL, /* parent */
					       funcname,
					       funcname,
					       funcname);
  /* end create logical locs */

  diagnostic *d = diagnostic_begin (diag_mgr,
				    DIAGNOSTIC_LEVEL_WARNING);
  diagnostic_set_location (d, loc_call_to_PyList_Append);
  diagnostic_set_logical_location (d, logical_loc);

  /* begin path creation */
  diagnostic_execution_path *path = diagnostic_add_execution_path (d);
  
  diagnostic_event_id alloc_event_id
    = diagnostic_execution_path_add_event (path,
					   loc_call_to_PyList_New,
					   logical_loc, 0,
					   "when %qs fails, returning NULL",
					   "PyList_New");
  diagnostic_execution_path_add_event (path,
				       loc_for_cond,
				       logical_loc, 0,
				       "when %qs", "i < count");
  diagnostic_execution_path_add_event (path,
				       loc_call_to_PyList_Append,
				       logical_loc, 0,
				       "when calling %qs, passing NULL from %@ as argument %i",
				       "PyList_Append", &alloc_event_id, 1);
  /* end path creation */

  diagnostic_finish (d,
		     "passing NULL as argument %i to %qs"
		     " which requires a non-NULL parameter",
		     1, "PyList_Append");

Moving on

That’s the end of the tutorial. For more information on libgdiagnostics, see the topic guide.