Tutorial part 2: physical locations

libgdiagnostics has two kinds of location:

  • physical locations expressed in terms of a specific file, and line(s) and perhaps column(s), such as some-file.c:3:1, or a range of columns, such as in:

    test-typo.c:19:13: error: unknown field 'colour'
       19 |   return p->colour;
          |             ^~~~~~
    

    or even a range spanning multiple lines of a file.

    All of these are instances of diagnostic_physical_location.

  • logical locations which refers to semantic constructs in the input, such as within function 'foo', or within namespace foo’s class bar’s member function get_color.

    These are instances of diagnostic_logical_location,

A diagnostic can have zero or more physical locations, and optionally have a logical location.

Let’s extend the previous example to add a physical location to the diagnostic; we’ll cover logical locations in the next section.

Source files

Given these declarations:

static diagnostic_manager *diag_mgr;
static const diagnostic_file *main_file;

we can create a diagnostic_file describing an input file foo.c via diagnostic_manager_new_file():

foo_c = diagnostic_manager_new_file (diag_mgr,
                                     "foo.c",
                                     "c" /* source_language */);

You can use diagnostic_manager_debug_dump_file() to print a representation of a diagnostic_file for debugging. For example:

diagnostic_manager_debug_dump_file (diag_mgr, foo_c, stderr);

might lead to this output on stderr:

file(name="foo.c", sarif_source_language="c")

Once we have a diagnostic_file we can use it to create instances of diagnostic_physical_location within the diagnostic_manager. These are owned by the diagnostic_manager and cleaned up automatically when diagnostic_manager_release() is called.

Instances of diagnostic_physical_location can refer to

Diagnostics affecting a whole source line

If we want a diagnostic to refer to an entire source line, we can use diagnostic_manager_new_location_from_file_and_line().

For example, given this example input where the tool can’t find the header:

#include <foo.h>

we could complain about it via libgdiagnostics via:

  const diagnostic_physical_location *loc
    = diagnostic_manager_new_location_from_file_and_line (diag_mgr,
							  main_file,
							  line_num);

  diagnostic *d = diagnostic_begin (diag_mgr,
				    DIAGNOSTIC_LEVEL_ERROR);
  diagnostic_set_location (d, loc);

  diagnostic_finish (d, "can't find %qs", "foo.h");

leading to output like this:

foo.c:17: error: can't find 'foo.h'"
   17 | #include <foo.h>

where libgdiagnostics will attempt to load the source file and quote the pertinent line.

If libgdiagnostics cannot open the file, it will merely print:

foo.c:17: error: can't find 'foo.h'

You can use diagnostic_manager_debug_dump_location() to dump a diagnostic_physical_location. For the above example:

diagnostic_manager_debug_dump_location (diag_mgr, loc, stderr);

might print:

foo.c:17

to stderr.

Columns and ranges

If we want to generate output like this:

foo.c:17:11: error: can't find 'foo'"
   17 | #include <foo.h>
      |           ^~~~~

where the diagnostic is marked as relating to the above range of characters in line 17, we need to express the range of characters within the line of interest.

We can do this by creating a diagnostic_physical_location for the start of the range, another one for the end of the range, and then using these two to create a diagnostic_physical_location for the range as a whole:

  const diagnostic_physical_location *loc_start
    = diagnostic_manager_new_location_from_file_line_column (diag_mgr, main_file, line_num, 11);
  const diagnostic_physical_location *loc_end
    = diagnostic_manager_new_location_from_file_line_column (diag_mgr, main_file, line_num, 15);
  const diagnostic_physical_location *loc_range
    = diagnostic_manager_new_location_from_range (diag_mgr,
						  loc_start,
						  loc_start,
						  loc_end);

  diagnostic *d = diagnostic_begin (diag_mgr,
				    DIAGNOSTIC_LEVEL_ERROR);
  diagnostic_set_location (d, loc_range);
  
  diagnostic_finish (d, "can't find %qs", "foo.h");

On compiling and running the program, we should get this output:

foo.c:17:11: error: can't find 'foo.h'
   17 | #include <foo.h>
      |           ^~~~~

where libgdiagnostics will attempt to load the source file and underling the pertinent part of the given line.

If libgdiagnostics cannot open the file, it will merely print:

foo.c:17:8: error: can't find 'foo'

A range can span multiple lines within the same file.

As before, you can use diagnostic_manager_debug_dump_location() to dump the locations. For the above example:

diagnostic_manager_debug_dump_location (diag_mgr, loc_start, stderr);

and:

diagnostic_manager_debug_dump_location (diag_mgr, loc_range, stderr);

might print:

foo.c:17:11

to stderr, whereas:

diagnostic_manager_debug_dump_location (diag_mgr, loc_end, stderr);

might print:

foo.c:17:15

Multiple locations

As well as the primary physical location seen above, a diagnostic can have additional physical locations. You can add these secondary locations via diagnostic_add_location().

For example, for this valid but suspicious-looking C code:

const char *strs[3] = {"foo",
                       "bar"
                       "baz"};

the following diagnostic has its primary location where the missing comma should be, and secondary locations for each of the string literals "foo", "bar", and "baz", added via diagnostic_add_location():

  const diagnostic_physical_location *loc_comma
    = diagnostic_manager_new_location_from_file_line_column (diag_mgr,
							     main_file,
							     foo_line_num + 1,
							     29);
  const diagnostic_physical_location *loc_foo
    = make_range (diag_mgr, main_file, foo_line_num, 24, 28);
  const diagnostic_physical_location *loc_bar
    = make_range (diag_mgr, main_file, foo_line_num + 1, 24, 28);
  const diagnostic_physical_location *loc_baz
    = make_range (diag_mgr, main_file, foo_line_num + 2, 24, 28);

  diagnostic *d = diagnostic_begin (diag_mgr,
				    DIAGNOSTIC_LEVEL_WARNING);
  diagnostic_set_location (d, loc_comma);
  diagnostic_add_location (d, loc_foo);
  diagnostic_add_location (d, loc_bar);
  diagnostic_add_location (d, loc_baz);

  diagnostic_add_fix_it_hint_insert_after (d, loc_bar, ",");

  diagnostic_finish (d, "missing comma");

where the text output might be:

test-multiple-lines.c:23:29: warning: missing comma
   22 | const char *strs[3] = {"foo",
      |                        ~~~~~
   23 |                        "bar"
      |                        ~~~~~^
   24 |                        "baz"};
      |                        ~~~~~

Labelling locations

You can give the locations labels using diagnostic_set_location_with_label() and diagnostic_add_location_with_label().

Consider emitting a “type mismatch” diagnostic for:

42 + "foo"

where the primary location is on the +, with secondary locations on the``42`` and the "foo":

  diagnostic *d = diagnostic_begin (diag_mgr,
				    DIAGNOSTIC_LEVEL_ERROR);
  diagnostic_set_location (d, loc_operator);
  diagnostic_add_location_with_label (d,
				      make_range (diag_mgr,
						  main_file,
						  line_num, 3, 4),
				      "int");
  diagnostic_add_location_with_label (d,
				      make_range (diag_mgr,
						  main_file,
						  line_num, 8, 12),
				      "const char *");
  
  diagnostic_finish (d, "mismatching types: %qs and %qs", "int", "const char *");

giving this text output:

test-labelled-ranges.c:9:6: error: mismatching types: 'int' and 'const char *'
   19 |   42 + "foo"
      |   ~~ ^ ~~~~~
      |   |    |
      |   int  const char *

More on locations

For more details on the above, see Physical locations. Otherwise the next part of the tutorial covers logical locations.