halting problem : Writing Bindable API, 2023 Edition

First of all, you should go on the gobject-introspection website and read the page on how to write bindable API. What I’m going to write here is going to build upon what’s already documented, or will update the best practices, so if you maintain a GObject/C library, or you’re writing one, you must be familiar with the basics of gobject-introspection. It’s 2023: it’s already too bad we’re still writing C libraries, we should at the very least be responsible about it.

A specific note for people maintaining an existing GObject/C library with an API designed before the mainstream establishment of gobject-introspection (basically, anything written prior to 2011): you should really consider writing all new types and entry points with gobject-introspection in mind, and you should also consider phasing out older API and replacing it piecemeal with a bindable one. You should have done this 10 years ago, and I can already hear the objections, but: too bad. Just because you made an effort 10 years ago it doesn’t mean things are frozen in time, and you don’t get to fix things. Maintenance means constantly tending to your code, and that doubly applies if you’re exposing an API to other people.

Let’s take the “how to write bindable API” recommendations, and elaborate them a bit.

Structures with custom memory management

The recommendation is to use GBoxed as a way to specify a copy and a free function, in order to clearly define the memory management semantics of a type.

The important caveat is that boxed types are necessary for:

opaque types that can only be heap allocated
using a type as a GObject property
using a type as an argument or return value for a GObject signal

You don’t need a boxed type for the following cases:

your type is an argument or return value for a method, function, or virtual function
your type can be placed on the stack, or can be allocated with malloc()/free()

Additionally, starting with gobject-introspection 1.76, you can specify the copy and free function of a type without necessarily registering a boxed type, which leaves boxed types for the thing they were created: signals and properties.

Addendum: object types

Boxed types should only ever be used for plain old data types; if you need inheritance, then the strong recommendation is to use GObject. You can use GTypeInstance, but only if you know what you’re doing; for more information on that, see my old blog post about typed instances.

Functionality only accessible through a C macro

This ought to be fairly uncontroversial. C pre-processor symbols don’t exist at the ABI level, and gobject-introspection is a mechanism to describe a C ABI. Never, ever expose API only through C macros; those are for C developers. C macros can be used to create convenience wrappers, but remember that anything they call must be public API, and that other people will need to re-implement the convenience wrappers themselves, so don’t overdo it. C developers deserve some convenience, but not at the expense of everyone else.

Addendum: inline functions

Static inline functions are also not part of the introspectable ABI of a library, because they cannot be used with dlsym(); you can provide inlined functions for performance reasons, but remember to always provide their non-inlined equivalent.

Direct C structure access for objects

Again, another fairly uncontroversial rule. You shouldn’t be putting anything into an instance structure, as it makes your API harder to future-proof, and direct access cannot do things like change notification, or memoization.

Always provide accessor functions.

`va_list`

Variadic argument functions are mainly C convenience. Yes, some languages can support them, but it’s a bad idea to have this kind of API exposed as the only way to do things.

Any variadic argument function should have two additional variants:

a vector based version, using C arrays (zero terminated, or with an explicit length)
a va_list version, to be used when creating wrappers with variadic arguments themselves

The va_list variant is kind of optional, since not many people go around writing variadic argument C wrappers, these days, but at the end of the day you might be going to write an internal function that takes a va_list anyway, so it’s not particularly strange to expose it as part of your public API.

The vector-based variant, on the other hand, is fundamental.

Incidentally, if you’re using variadic arguments as a way to collect similarly typed values, e.g.:

// void
// some_object_method (SomeObject *self,
//                     ...) G_GNUC_NULL_TERMINATED

some_object_method (obj, "foo", "bar", "baz", NULL);

there’s very little difference to using a vector and C99’s compound literals:

// void
// some_object_method (SomeObject *self,
//                     const char *args[])

some_object_method (obj, (const char *[]) {
                      "foo",
                      "bar",
                      "baz",
                      NULL,
                    });

Except that now the compiler will be able to do some basic type check and scream at you if you’re doing something egregiously bad.

Compound literals and designated initialisers also help when dealing with key/value pairs:

typedef struct {
  int column;
  union {
    const char *v_str;
    int v_int;
  } value;
} ColumnValue;

enum {
  COLUMN_NAME,
  COLUMN_AGE,
  N_COLUMNS
};

// void
// some_object_method (SomeObject *self,
//                     size_t n_columns,
//                     const ColumnValue values[])

some_object_method (obj, 2,
  (ColumnValue []) {
    { .column = COLUMN_NAME, .data = { .v_str = "Emmanuele" } },
    { .column = COLUMN_AGE, .data = { .v_int = 42 } },
  });

So you should seriously reconsider the amount of variadic arguments convenience functions you expose.

Multiple out parameters

Using a structured type with a out direction is a good recommendation as a way to both limit the amount of out arguments and provide some future-proofing for your API. It’s easy to expand an opaque pointer type with accessors, whereas adding more out arguments requires an ABI break.

Addendum: `inout` arguments

Don’t use in-out arguments. Just don’t.

Pass an in argument to the callable for its input, and take an out argument or a return value for the output.

Memory management and ownership of inout arguments is incredibly hard to capture with static annotations; it mainly works for scalar values, so:

void
some_object_update_matrix (SomeObject *self,
                           double *xx,
                           double *yy,
                           double *xy,
                           double *yx)

can work with xx, yy, xy, yx as inout arguments, because there’s no ownership transfer; but as soon as you start throwing things in like pointers to structures, or vectors of string, you open yourself to questions like:

who allocates the argument when it goes in?
who is responsible for freeing the argument when it comes out?
what happens if the function frees the argument in the in direction and then re-allocates the out?
what happens if the function uses a different allocator than the one used by the caller?
what happens if the function has to allocate more memory?
what happens if the function modifies the argument and frees memory?

Even if gobject-introspection nailed down the rules, they could not be enforced, or validated, and could lead to leaks or, worse, crashes.

So, once again: don’t use inout arguments. If your API already exposes inout arguments, especially for non-scalar types, consider deprecations and adding new entry points.

Addendum: `GValue`

Sadly, GValue is one of the most notable cases of inout abuse. The oldest parts of the GNOME stack use GValue in a way that requires inout annotations because they expect the caller to:

initialise a GValue with the desired type
pass the address of the value
let the function fill the value

The caller is then left with calling g_value_unset() in order to free the resources associated with a GValue. This means that you’re passing an initialised value to a callable, the callable will do something to it (which may or may not even entail re-allocating the value) and then you’re going to get it back at the same address.

It would be a lot easier if the API left the job of initialising the GValue to the callee; then functions could annotate the GValue argument with out and caller-allocates=1. This would leave the ownership to the caller, and remove a whole lot of uncertainty.

Various new (comparatively speaking) API allow the caller to pass an unitialised GValue, and will leave initialisation to the callee, which is how it should be, but this kind of change isn’t always possible in a backward compatible way.

Arrays

You can use three types of C arrays in your API:

zero-terminated arrays, which are the easiest to use, especially for pointers and strings
fixed-size arrays
arrays with length arguments

Addendum: strings and byte arrays

A const char* argument for C strings with a length argument is not an array:

/**
 * some_object_load_data:
 * @self: ...
 * @str: the data to load
 * @len: length of @str in bytes, or -1
 *
 * ...
 */
void
some_object_load_data (SomeObject *self,
                       const char *str,
                       ssize_t len)

Never annotate the str argument with array length=len. Ideally, this kind of function should not exist in the first place. You should always use const char* for NUL-terminated strings, possibly UTF-8 encoded; if you allow embedded NUL characters then use a bytes array:

/**
 * some_object_load_data:
 * @self: ...
 * @data: (array length=len) (element-type uint8): the data to load
 * @len: the length of the data in bytes
 *
 * ...
 */
void
some_object_load_data (SomeObject *self,
                       const unsigned char *data,
                       size_t len)

Instead of unsigned char you can also use uint8_t, just to drive the point home.

Yes, it’s slightly nicer to have a single entry point for strings and byte arrays, but that’s just a C convenience: decent languages will have a proper string type, which always comes with a length; and string types are not binary data.

Addendum: `GArray`, `GPtrArray`, `GByteArray`

Whatever you do, however low you feel on the day, whatever particular tragedy befell your family at some point, please: never use GLib array types in your API. Nothing good will ever come of it, and you’ll just spend your days regretting this choice.

Yes: gobject-introspection transparently converts between GLib array types and C types, to the point of allowing you to annotate the contents of the array. The problem is that that information is static, and only exists at the introspection level. There’s nothing that prevents you from putting other random data into a GPtrArray, as long as it’s pointer-sized. There’s nothing that prevents a version of a library from saying that you own the data inside a GArray, and have the next version assign a clear function to the array to avoid leaking it all over the place on error conditions, or when using g_autoptr.

Adding support for GLib array types in the introspection was a well-intentioned mistake that worked in very specific cases—for instance, in a library that is private to an application. Any well-behaved, well-designed general purpose library should not expose this kind of API to its consumers.

You should use GArray, GPtrArray, and GByteArray internally; they are good types, and remove a lot of the pain of dealing with C arrays. Those types should never be exposed at the API boundary: always convert them to C arrays, or wrap them into your own data types, with proper argument validation and ownership rules.

Addendum: `GHashTable`

What’s worse than a type that contains data with unclear ownership rules decided at run time? A type that contains twice the amount of data with unclear ownership rules decided at run time.

Just like the GLib array types, hash tables should be used but never directly exposed to consumers of an API.

Addendum: `GList`, `GSList`, `GQueue`

See above, re: pain and misery. On top of that, linked lists are a terrible data type that people should rarely, if ever, use in the first place.

Callbacks

Your callbacks should always be in the form of a simple callable with a data argument:

typedef void (* SomeCallback) (SomeObject *obj,
                               gpointer data);

Any function that takes a callback should also take a “user data” argument that will be passed as is to the callback:

// scope: call; the callback data is valid until the
// function returns
void
some_object_do_stuff_immediately (SomeObject *self,
                                  SomeCallback callback,
                                  gpointer data);

// scope: notify; the callback data is valid until the
// notify function gets called
void
some_object_do_stuff_with_a_delay (SomeObject *self,
                                   SomeCallback callback,
                                   gpointer data,
                                   GDestroyNotify notify);

// scope: async; the callback data is valid until the async
// callback is called
void
some_object_do_stuff_but_async (SomeObject *self,
                                GCancellable *cancellable,
                                GAsyncReadyCallback callback,
                                gpointer data);

// not pictured here: scope forever; the data is valid fori
// the entirety of the process lifetime

If your function takes more than one callback argument, you should make sure that it also takes a different user data for each callback, and that the lifetime of the callbacks are well defined. The alternative is to use GClosure instead of a simple C function pointer—but that comes at a cost of GValue marshalling, so the recommendation is to stick with one callback per function.

Addendum: the `closure` annotation

It seems that many people are unclear about the closure annotation.

Whenever you’re describing a function that takes a callback, you should always annotate the callback argument with the argument that contains the user data using the (closure argument) annotation, e.g.

/**
 * some_object_do_stuff_immediately:
 * @self: ...
 * @callback: (scope call) (closure data): the callback
 * @data: the data to be passed to the @callback
 *
 * ...
 */

You should not annotate the data argument with a unary (closure).

The unary (closure) is meant to be used when annotating the callback type:

/**
 * SomeCallback:
 * @self: ...
 * @data: (closure): ...
 *
 * ...
 */
typedef void (* SomeCallback) (SomeObject *self,
                               gpointer data);

Yes, it’s confusing, I know.

Sadly, the introspection parser isn’t very clear about this, but in the future it will emit a warning if it finds a unary closure on anything that isn’t a callback type.

Ideally, you don’t really need to annotate anything when you call your argument user_data, but it does not hurt to be explicit.

A cleaned up version of this blog post will go up on the gobject-introspection website, and we should really have a proper set of best API design practices on the Developer Documentation website by now; nevertheless, I do hope people will actually follow these recommendations at some point, and that they will be prepared for new recommendations in the future. Only dead and unmaintained projects don’t change, after all, and I expect the GNOME stack to last a bit longer than the 25 years it already spans today.

gnome bindings api design

Structures with custom memory management

Addendum: object types

Functionality only accessible through a C macro

Addendum: inline functions

Direct C structure access for objects

va_list

Multiple out parameters

Addendum: inout arguments

Addendum: GValue

Arrays

Addendum: strings and byte arrays

Addendum: GArray, GPtrArray, GByteArray

Addendum: GHashTable

Addendum: GList, GSList, GQueue