Just-In-Time compilation using GCC (libgccjit.so)

GNU Tools Cauldron 2014

David Malcolm <dmalcolm@redhat.com>

What is libgccjit.so

Experimental branch of GCC

Branch "dmalcolm/jit" within git
Building GCC as a shared library
Suitable for embedding inside other programs and libraries
In-process code-generation, at runtime

See http://gcc.gnu.org/wiki/JIT

Prebuilt RPM packages available for Fedora and RHEL:

https://gcc.gnu.org/ml/jit/2014-q2/msg00009.html

Why?

JIT compilation use cases:

Language runtimes
- JVM
- Python
- OpenGL shader language
Other uses?
- FFI?
Your project?

Why a dedicated API for JIT?

We need some kind of API that people can use to write JIT compilers
We need to be able to provide API and ABI guarantees
Existing internal APIs are in flux
Existing internal APIs aren't easy to use - wrap them

Design decisions

Assume that we're interfacing with C code
- e.g. terminology
All types are opaque.
Support multithreaded client code
Very high-level API

What the API looks like

A C header file; currently with:

72 function prototypes

12 opaque types

8 enums

State and lifetime management

All state is hung off of context objects:

typedef struct gcc_jit_context gcc_jit_context;

extern gcc_jit_context *
gcc_jit_context_acquire (void);

extern void
gcc_jit_context_release (gcc_jit_context *ctxt);

Simple memory management for client code

Everything that's created from a context is cleaned up when the context is released.

Entities within a context

extern const char *
gcc_jit_object_get_debug_string (gcc_jit_object *obj);

Very useful for debugging
Internal note: if you call this on an object, the const char * has the same lifetime as the object.

Entities within a context (2)

Source-Code Locations

Optional, but useful to end-users

/* Use this to create locations: */
extern gcc_jit_location *
gcc_jit_context_new_location (gcc_jit_context *ctxt,
                              const char *filename,
                              int line,
                              int column);

/* Need to turn on generation of debuginfo: */
gcc_jit_context_set_bool_option (
  ctxt, GCC_JIT_BOOL_OPTION_DEBUGINFO, 1);

Source-Code Locations (2)

We can use this to single-step through the machine code e.g. generated for bytecode:

(gdb) break fibonacci
(gdb) run
Breakpoint 1, fibonacci (input=8) at main.cc:43
43      DUP,
(gdb) next
47      PUSH_INT_CONST, 2,
(gdb) next
51      BINARY_INT_COMPARE_LT,
(gdb) next
55      JUMP_ABS_IF_TRUE, 17,
(gdb) next
59      DUP,
(gdb) next
63      PUSH_INT_CONST,  1,
(gdb) next
67      BINARY_INT_SUBTRACT,

Types

Access to simple C types:

gcc_jit_type *int_type =
   gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_INT);

gcc_jit_type *double_type =
   gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_DOUBLE);

/* etc */

Types (2)

structs
function pointers
const, volatile
etc

One-time setup vs per-compile state

A common pattern:

one-time setup:

The client code maps its own API into the JIT world:
- create gcc_jit_type instances representing the structs and other types of interest
- similar for globals, functions, etc
repeatedly reuse (1) as each method becomes "hot", using (1) to compile each method to machine code

Seen e.g. in GNU Octave's JIT compiler.

One-time setup vs per-compile state (2)

How to handle this?

If we do it all in one context, we'll have a slow leak due to all of the per-method state never going away.

One-time setup vs per-compile state (3)

Solution: nested contexts:

extern gcc_jit_context *
gcc_jit_context_new_child_context (gcc_jit_context *parent_ctxt);

Create a parent context, and do the one-time setup within it
Create child context as each method becomes hot, compiling that method.
Clean up the child context immediately.
The parent context persists for the lifetime of the program.

One-time setup vs per-compile state (4)

Arbitrary nesting is allowed.
The child can reference objects created within the parent, but not vice-versa.
The lifetime of the child context must be bounded by that of the parent: client code should release a child context before releasing the parent context.

Functions

How to generate the equivalent of:

const char *
test_string_literal (void)
{
   return "hello world";
}

Functions (2)

gcc_jit_type *const_char_ptr_type =
  gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_CONST_CHAR_PTR);

/* Build the test_fn.  */
gcc_jit_function *test_fn =
  gcc_jit_context_new_function (ctxt, NULL,
                                GCC_JIT_FUNCTION_EXPORTED,
                                const_char_ptr_type,
                                "test_string_literal",
                                0, NULL,
                                0);
gcc_jit_block *block = gcc_jit_function_new_block (test_fn, NULL);

gcc_jit_block_end_with_return (
  block, NULL,
  gcc_jit_context_new_string_literal (ctxt, "hello world"));

Functions (3)

Example of a conditional:

/* if (i >= n) */
gcc_jit_block_end_with_conditional (
  loop_cond, NULL,
  gcc_jit_context_new_comparison (
     ctxt, NULL,
     GCC_JIT_COMPARISON_GE,
     gcc_jit_lvalue_as_rvalue (i),
     gcc_jit_param_as_rvalue (n)),
  after_loop,
  loop_body);

Functions (4)

/* sum += i * i */
gcc_jit_block_add_assignment_op (
  loop_body, NULL,
  sum, /* lvalue */
  GCC_JIT_BINARY_OP_PLUS,
  gcc_jit_context_new_binary_op ( /* rvalue */
     ctxt, NULL,
     GCC_JIT_BINARY_OP_MULT, the_type,
     gcc_jit_lvalue_as_rvalue (i),
     gcc_jit_lvalue_as_rvalue (i)));

Comments as a first-class entity

extern void
gcc_jit_block_add_comment (gcc_jit_block *block,
                           gcc_jit_location *loc,
                           const char *text);

Very useful for debugging

e.g.

gcc_jit_block_add_comment (b_entry, NULL,
                           "for i in 0 to (ARRAY_SIZE - 1):");

Internally they are implemented as dummy labels.

Shouldn't affect optimization.

Visible in dumps of initial tree and of gimple.

Error-handling

Inspired by OpenGL:

record errors

fail if an error has occurred

fail gracefully when called after an error

Client code only has to check for errors once.

extern const char *
gcc_jit_context_get_first_error (gcc_jit_context *ctxt);

What the API doesn't do

Type inference
Escape analysis
Unboxing
Inline caching

etc

The C++ API

Methods, and (optionally) operator overloading:

struct quadratic
{
  double a;
  double b;
  double c;
  double discriminant;
};

gccjit::rvalue q_a = param_q.dereference_field (field_a);
gccjit::rvalue q_b = param_q.dereference_field (field_b);
gccjit::rvalue q_c = param_q.dereference_field (field_c);

gccjit::rvalue four =
  ctxt.new_rvalue (double_type, 4);

The C++ API (2)

gccjit::block block = calc_discriminant.new_block ();
block.add_comment ("(b^2 - 4ac)");

block.add_assignment (
  /* q->discriminant =...  */
  param_q.dereference_field (field_discriminant),
  /* (q->b * q->b) - (4 * q->a * q->c) */
  (q_b * q_b) - (four * q_a * q_c));
block.end_with_return ();

Python bindings

See https://github.com/davidmalcolm/pygccjit:

# Create parameter "i":
param_i = ctxt.new_param(int_type, b'i')
# Create the function:
fn = ctxt.new_function(gccjit.FunctionKind.EXPORTED,
                       int_type,
                       b"square",
                       [param_i])

Python bindings (2)

# Create a basic block within the function:
block = fn.new_block(b'entry')

# This basic block is relatively simple:
block.end_with_return(
    ctxt.new_binary_op(gccjit.BinaryOp.MULT,
                       int_type,
                       param_i, param_i))

# Having populated the context, compile it.
jit_result = ctxt.compile()

# This is what you get back from ctxt.compile():
assert isinstance(jit_result, gccjit.Result)

"Coconut": a JIT compiler for Python

https://github.com/davidmalcolm/coconut

(not to be confused with "Unladen Swallow")

Compiles CPython bytecode to machine code

Uses the Python bindings to libgccjit

"Coconut": a JIT compiler for Python (2)

def f(a, b):
  return a * b

"Coconut": a JIT compiler for Python (3)

One basic block:

"Coconut": a JIT compiler for Python (4)

31 basic blocks:

"Coconut": a JIT compiler for Python (5)

Status: an experiment:

Works on simple functions (not all bytecodes implemented yet)
Not a performance win
- Relinquishing fully dynamic behavior?
- Aside: "Method JIT" vs "Tracing JIT"
Has led to bug fixes in libgccjit

Bindings for other languages?

Yes please!

Implementation Details

It looks like a library to client code
It looks like a frontend to the rest of gcc

How it originally worked

The original way it worked:

State removal: the clean way vs the hack

gcc::context::context ()
{
  m_dumps = new gcc::dump_manager ();
  m_passes = new gcc::pass_manager (this);
}

State removal: the clean way vs the hack (2)

Add a big mutex and...

/* For those that want to, this function aims to clean up enough
   state that you can call toplev::main again. */
void
toplev::finalize (void)
{
  cgraph_c_finalize ();
  cgraphbuild_c_finalize ();
  cgraphunit_c_finalize ();
  dwarf2out_c_finalize ();
  /* etc */
}

State removal: the clean way vs the hack (3)

void cgraph_c_finalize (void)
{
  x_cgraph_nodes_queue = NULL;
  cgraph_n_nodes = 0;
  cgraph_max_uid = 0;
  cgraph_edge_max_uid = 0;
  cgraph_global_info_ready = false;
  cgraph_state = CGRAPH_STATE_PARSING;
  cgraph_function_flags_ready = false;
  /* etc */
}

State removal: the clean way vs the hack (4)

The testsuite for JIT now runs at the equivalent of -O3

(with each test running in-process 5 times, to shake out state issues)

Assembler as a shared library?

Currently the library:

writes out a .s file to a tempdir
invokes another "gcc" on it to convert it to a .so
dlopen on the .so and then dlsym

This shows up as a significant part of the profile.

I would prefer to do this all in-process.

Are there shared libraries for these stages in our toolchain?

OK if I factor out the spec language from the gcc harness?

Summary

GCC as a shared library exists
Has been through significant testing
- 3 experimental method JITs. No tracing JITs yet.
Has located various issues within GCC
- state management

Next steps

Try using it in your language runtime!
Package it for more distributions
More language bindings
Implement support for a tracing JIT (PyPy?)
What would it take to get it merged for the next major GCC release?

Questions and Discussion

Thanks for listening!