First of all, you should go on the gobject-introspection website and read the page on how to write bindable API. What I’m going to write here is going to build upon what’s already documented, or will update the best practices, so if you maintain a GObject/C library, or you’re writing one, you must be familiar with the basics of gobject-introspection. It’s 2023: it’s already too bad we’re still writing C libraries, we should at the very least be responsible about it.
A specific note for people maintaining an existing GObject/C library with an API designed before the mainstream establishment of gobject-introspection (basically, anything written prior to 2011): you should really consider writing all new types and entry points with gobject-introspection in mind, and you should also consider phasing out older API and replacing it piecemeal with a bindable one. You should have done this 10 years ago, and I can already hear the objections, but: too bad. Just because you made an effort 10 years ago it doesn’t mean things are frozen in time, and you don’t get to fix things. Maintenance means constantly tending to your code, and that doubly applies if you’re exposing an API to other people.
Let’s take the “how to write bindable API” recommendations, and elaborate them a bit.
Structures with custom memory management
The recommendation is to use GBoxed
as a way to specify a copy and a free
function, in order to clearly define the memory management semantics of a type.
The important caveat is that boxed types are necessary for:
- opaque types that can only be heap allocated
- using a type as a GObject property
- using a type as an argument or return value for a GObject signal
You don’t need a boxed type for the following cases:
- your type is an argument or return value for a method, function, or virtual function
- your type can be placed on the stack, or can be allocated with
malloc()
/free()
Additionally, starting with gobject-introspection 1.76, you can specify the copy and free function of a type without necessarily registering a boxed type, which leaves boxed types for the thing they were created: signals and properties.
Addendum: object types
Boxed types should only ever be used for plain old data types; if you need
inheritance, then the strong recommendation is to use GObject
. You can use
GTypeInstance
, but only if you know what you’re doing; for more
information on that, see my old blog post about typed instances.
Functionality only accessible through a C macro
This ought to be fairly uncontroversial. C pre-processor symbols don’t exist at the ABI level, and gobject-introspection is a mechanism to describe a C ABI. Never, ever expose API only through C macros; those are for C developers. C macros can be used to create convenience wrappers, but remember that anything they call must be public API, and that other people will need to re-implement the convenience wrappers themselves, so don’t overdo it. C developers deserve some convenience, but not at the expense of everyone else.
Addendum: inline functions
Static inline functions are also not part of the introspectable ABI of a
library, because they cannot be used with dlsym()
; you can provide inlined
functions for performance reasons, but remember to always provide their
non-inlined equivalent.
Direct C structure access for objects
Again, another fairly uncontroversial rule. You shouldn’t be putting anything into an instance structure, as it makes your API harder to future-proof, and direct access cannot do things like change notification, or memoization.
Always provide accessor functions.
va_list
Variadic argument functions are mainly C convenience. Yes, some languages can support them, but it’s a bad idea to have this kind of API exposed as the only way to do things.
Any variadic argument function should have two additional variants:
- a vector based version, using C arrays (zero terminated, or with an explicit length)
- a
va_list
version, to be used when creating wrappers with variadic arguments themselves
The va_list
variant is kind of optional, since not many people go around
writing variadic argument C wrappers, these days, but at the end of the day
you might be going to write an internal function that takes a va_list
anyway, so it’s not particularly strange to expose it as part of your public
API.
The vector-based variant, on the other hand, is fundamental.
Incidentally, if you’re using variadic arguments as a way to collect similarly typed values, e.g.:
// void
// some_object_method (SomeObject *self,
// ...) G_GNUC_NULL_TERMINATED
some_object_method (obj, "foo", "bar", "baz", NULL);
there’s very little difference to using a vector and C99’s compound literals:
// void
// some_object_method (SomeObject *self,
// const char *args[])
some_object_method (obj, (const char *[]) {
"foo",
"bar",
"baz",
NULL,
});
Except that now the compiler will be able to do some basic type check and scream at you if you’re doing something egregiously bad.
Compound literals and designated initialisers also help when dealing with key/value pairs:
typedef struct {
int column;
union {
const char *v_str;
int v_int;
} value;
} ColumnValue;
enum {
COLUMN_NAME,
COLUMN_AGE,
N_COLUMNS
};
// void
// some_object_method (SomeObject *self,
// size_t n_columns,
// const ColumnValue values[])
some_object_method (obj, 2,
(ColumnValue []) {
{ .column = COLUMN_NAME, .data = { .v_str = "Emmanuele" } },
{ .column = COLUMN_AGE, .data = { .v_int = 42 } },
});
So you should seriously reconsider the amount of variadic arguments convenience functions you expose.
Multiple out parameters
Using a structured type with a out
direction is a good recommendation as a
way to both limit the amount of out
arguments and provide some
future-proofing for your API. It’s easy to expand an opaque pointer type
with accessors, whereas adding more out
arguments requires an ABI break.
Addendum: inout
arguments
Don’t use in-out arguments. Just don’t.
Pass an in
argument to the callable for its input, and take an out
argument or a return value for the output.
Memory management and ownership of inout
arguments is incredibly hard to
capture with static annotations; it mainly works for scalar values, so:
void
some_object_update_matrix (SomeObject *self,
double *xx,
double *yy,
double *xy,
double *yx)
can work with xx
, yy
, xy
, yx
as inout
arguments, because there’s
no ownership transfer; but as soon as you start throwing things in like
pointers to structures, or vectors of string, you open yourself to questions like:
- who allocates the argument when it goes in?
- who is responsible for freeing the argument when it comes out?
- what happens if the function frees the argument in the
in
direction and then re-allocates theout
? - what happens if the function uses a different allocator than the one used by the caller?
- what happens if the function has to allocate more memory?
- what happens if the function modifies the argument and frees memory?
Even if gobject-introspection nailed down the rules, they could not be enforced, or validated, and could lead to leaks or, worse, crashes.
So, once again: don’t use inout
arguments. If your API already exposes
inout
arguments, especially for non-scalar types, consider deprecations
and adding new entry points.
Addendum: GValue
Sadly, GValue
is one of the most notable cases of inout
abuse. The
oldest parts of the GNOME stack use GValue
in a way that requires inout
annotations because they expect the caller to:
- initialise a
GValue
with the desired type - pass the address of the value
- let the function fill the value
The caller is then left with calling g_value_unset()
in order to free the
resources associated with a GValue
. This means that you’re passing an
initialised value to a callable, the callable will do something to it (which
may or may not even entail re-allocating the value) and then you’re going to
get it back at the same address.
It would be a lot easier if the API left the job of initialising the
GValue
to the callee; then functions could annotate the GValue
argument
with out
and caller-allocates=1
. This would leave the ownership to the
caller, and remove a whole lot of uncertainty.
Various new (comparatively speaking) API allow the caller to pass an
unitialised GValue
, and will leave initialisation to the callee, which is
how it should be, but this kind of change isn’t always possible in a
backward compatible way.
Arrays
You can use three types of C arrays in your API:
- zero-terminated arrays, which are the easiest to use, especially for pointers and strings
- fixed-size arrays
- arrays with length arguments
Addendum: strings and byte arrays
A const char*
argument for C strings with a length argument is not an array:
/**
* some_object_load_data:
* @self: ...
* @str: the data to load
* @len: length of @str in bytes, or -1
*
* ...
*/
void
some_object_load_data (SomeObject *self,
const char *str,
ssize_t len)
Never annotate the str
argument with array length=len
. Ideally, this
kind of function should not exist in the first place. You should always
use const char*
for NUL
-terminated strings, possibly UTF-8 encoded; if
you allow embedded NUL
characters then use a bytes array:
/**
* some_object_load_data:
* @self: ...
* @data: (array length=len) (element-type uint8): the data to load
* @len: the length of the data in bytes
*
* ...
*/
void
some_object_load_data (SomeObject *self,
const unsigned char *data,
size_t len)
Instead of unsigned char
you can also use uint8_t
, just to drive the
point home.
Yes, it’s slightly nicer to have a single entry point for strings and byte arrays, but that’s just a C convenience: decent languages will have a proper string type, which always comes with a length; and string types are not binary data.
Addendum: GArray
, GPtrArray
, GByteArray
Whatever you do, however low you feel on the day, whatever particular tragedy befell your family at some point, please: never use GLib array types in your API. Nothing good will ever come of it, and you’ll just spend your days regretting this choice.
Yes: gobject-introspection transparently converts between GLib array types
and C types, to the point of allowing you to annotate the contents of the
array. The problem is that that information is static, and only exists at
the introspection level. There’s nothing that prevents you from putting
other random data into a GPtrArray
, as long as it’s pointer-sized.
There’s nothing that prevents a version of a library from saying that you
own the data inside a GArray
, and have the next version assign a clear
function to the array to avoid leaking it all over the place on error
conditions, or when using g_autoptr
.
Adding support for GLib array types in the introspection was a well-intentioned mistake that worked in very specific cases—for instance, in a library that is private to an application. Any well-behaved, well-designed general purpose library should not expose this kind of API to its consumers.
You should use GArray
, GPtrArray
, and GByteArray
internally; they are
good types, and remove a lot of the pain of dealing with C arrays. Those
types should never be exposed at the API boundary: always convert them to C
arrays, or wrap them into your own data types, with proper argument
validation and ownership rules.
Addendum: GHashTable
What’s worse than a type that contains data with unclear ownership rules decided at run time? A type that contains twice the amount of data with unclear ownership rules decided at run time.
Just like the GLib array types, hash tables should be used but never directly exposed to consumers of an API.
Addendum: GList
, GSList
, GQueue
See above, re: pain and misery. On top of that, linked lists are a terrible data type that people should rarely, if ever, use in the first place.
Callbacks
Your callbacks should always be in the form of a simple callable with a data argument:
typedef void (* SomeCallback) (SomeObject *obj,
gpointer data);
Any function that takes a callback should also take a “user data” argument that will be passed as is to the callback:
// scope: call; the callback data is valid until the
// function returns
void
some_object_do_stuff_immediately (SomeObject *self,
SomeCallback callback,
gpointer data);
// scope: notify; the callback data is valid until the
// notify function gets called
void
some_object_do_stuff_with_a_delay (SomeObject *self,
SomeCallback callback,
gpointer data,
GDestroyNotify notify);
// scope: async; the callback data is valid until the async
// callback is called
void
some_object_do_stuff_but_async (SomeObject *self,
GCancellable *cancellable,
GAsyncReadyCallback callback,
gpointer data);
// not pictured here: scope forever; the data is valid fori
// the entirety of the process lifetime
If your function takes more than one callback argument, you should make sure
that it also takes a different user data for each callback, and that the
lifetime of the callbacks are well defined. The alternative is to use
GClosure
instead of a simple C function pointer—but that comes at a cost
of GValue
marshalling, so the recommendation is to stick with one callback
per function.
Addendum: the closure
annotation
It seems that many people are unclear about the closure
annotation.
Whenever you’re describing a function that takes a callback, you should
always annotate the callback argument with the argument that contains
the user data using the (closure argument)
annotation, e.g.
/**
* some_object_do_stuff_immediately:
* @self: ...
* @callback: (scope call) (closure data): the callback
* @data: the data to be passed to the @callback
*
* ...
*/
You should not annotate the data
argument with a unary (closure)
.
The unary (closure)
is meant to be used when annotating the callback
type:
/**
* SomeCallback:
* @self: ...
* @data: (closure): ...
*
* ...
*/
typedef void (* SomeCallback) (SomeObject *self,
gpointer data);
Yes, it’s confusing, I know.
Sadly, the introspection parser isn’t very clear about this, but in the
future it will emit a warning if it finds a unary closure
on anything that
isn’t a callback type.
Ideally, you don’t really need to annotate anything when you call your
argument user_data
, but it does not hurt to be explicit.
A cleaned up version of this blog post will go up on the gobject-introspection website, and we should really have a proper set of best API design practices on the Developer Documentation website by now; nevertheless, I do hope people will actually follow these recommendations at some point, and that they will be prepared for new recommendations in the future. Only dead and unmaintained projects don’t change, after all, and I expect the GNOME stack to last a bit longer than the 25 years it already spans today.