halting problem :: Reference counting in modern GLib

:: ~4 min read

GLib 2.58 introduced new API to facilitate the implementation of reference counted types; as many C developers have had to deal with this kind of code at one point or another, it's a good thing to provide some insight and best practices on how to do it properly

Reference counting is a fairly common garbage collection technique used in many projects; the core GNOME platform uses pretty much all the time, from container data types, to GObject.

Implementing reference counting in C is typically fairly easy. Add a field to your data type:

typedef struct {
  int ref_count;

  char *name;
  char *address;
  char *city;

  int age;
} Person;

Then initialise it when creating a new instance:

Person *
person_new (void)
{
  Person *res = g_new0 (Person, 1);

  res->ref_count = 1;

  return res;
}

Instead of a person_copy() and a person_free() pair of functions, you need to write a person_ref() and a person_unref() pair:

Person *
person_ref (Person *p)
{
  // Acquire a reference
  p->ref_count++;

  return p;
}

static void
person_free (Person *p)
{
  // Free the data
  g_free (p->name);
  g_free (p->address);
  g_free (p->city);

  // Free the instance
  g_free (p);
}

void
person_unref (Person *p)
{
  // Release a reference
  p->ref_count--;

  // If this was the last reference, free the data
  // associated to the instance and then the instance
  // itself
  if (p->ref_count == 0)
    {
      person_free (p);
    }
}

Of course, trivial doesn’t mean correct. For instance, the code above assumes that all reference acquisition and release operations will happen from the same thread; if that’s not the case, you’ll have to use atomic integer operations to increase and decrease the reference count. Additionally, the code above does not check for overflows in the reference counting, which means that the value could saturate and lead to leaks.

Reimplementing all the checks and behaviours is not only boring, but it inevitably leads to mistakes along the way. For this reason, GLib 2.58 introduced two new types:

  • grefcount, to implement simple reference counting
  • gatomicrefcount, to implement atomic reference counting

Both come with their own API, which leads to this code:

typedef struct {
  grefcount ref_count;
  // or: gatomicrefcount ref_count;

  // same as above
} Person;

Person *
person_new (void)
{
  Person *res = g_new0 (Person, 1);

  g_ref_count_init (&res->ref_count);
  // or: g_atomic_ref_count_init (&res->ref_count);

  return res;
}

Person *
person_ref (Person *p)
{
  g_ref_count_inc (&p->ref_count);
  // or: g_atomic_ref_count_inc (&p->ref_count);

  return p;
}

void
person_unref (Person *p)
{
  if (g_ref_count_dec (&p->ref_count))
  // or: if (g_atomic_ref_count_dec (&p->ref_count))
    {
      person_free (p);
    }
}

The grefcount and gatomicrefcount make it immediately obvious that the field is used to implement reference counting semantics; the API checks for saturation of the counters, and will emit a warning; the atomic operations are verified. Additionally, the API checks if you’re trying to pass an atomic reference count to the grefcount API, and vice versa, so you have a layer of protection there during eventual refactoring, even if the types are both integer aliases.

We did not stop there, though.

Adding a reference count field to a structure is not always possible; for instance, if you have ABI compatibility restrictions, or if the type defition is public, adding a field may just not be something you can do within the same API version. By way of an example: you may have a type meant to be typically placed on the stack, so it needs a public, complete declaration, in order to have a well-defined size at compile time. You may also want to pass around an instance of the type as the argument for a GObject signal, or as a property — but it may be expensive to copy the data around, so you really want to have an optional reference counting mechanism that is invisible to the vast majority of the use cases.

This is why we also added an allocator function that adds reference counting semantics to the memory areas it returns, called GRcBox:

typedef struct {
  char *name;
  char *address;
  char *city;

  int age;
} Person;

Person *
person_new (void)
{
  return g_rc_box_new0 (Person);
}

Person *
person_ref (Person *p)
{
  return g_rc_box_acquire (p);
}

void
person_unref (Person *p)
{
  // person_free() is copied from the code above; we use
  // g_rc_box_release_full() because we have data to free
  // as well, but there's a variant for structures that
  // do not have internal allocations
  g_rc_box_release_full (p, person_free);
}

As you can see, this cuts down the boilerplate and repetition considerably.

The data returned to you by g_rc_box_new() and friends is exactly the same as you’d get from g_new(), so you can pass it around to your API exactly the same — but it is transparently augmented with reference counting semantics. You acquire and release references without having an explicit reference counter. The only restriction is that you cannot reallocate the data, so calling realloc() is not allowed; and, of course, you cannot free the memory directly with free() — you need to release the last reference.

Similar to GRcBox, there’s a GAtomicRcBox, which provides atomic reference counting semantics, with a similar API.

Both GRcBox and GAtomicRcBox are Valgrind-safe, so you won’t get false positives or unreachable memory if run your code under Valgrind.

You don’t even need to have a structure to allocate: you can use GRcBox to allocate any memory area with a non-zero size. Incidentally, this is how we implemented a oft-requested feature: reference counted strings, which are also available in GLib 2.58.

Reference counted strings work exactly like any other C string, but instead of copying them and freeing them, you acquire and release references to them:

char *s = g_ref_string_new ("hello, world!");

// "s" is just like any other C string
g_print ("s['%s'] length = %d\n", s, strlen (s));

g_ref_string_release (s);

Reference counted strings can also be interned, which means that calling g_ref_string_new_intern() with the same string twice will give you a new reference the second time around, instead of allocating a new string; once the last reference to the interned string is dropped, the string is un-interned, and the resource allocated for it are freed.

Since you may be wary of passing around char * for reference counted strings, there’s a handy GRefString C type you can use to improve readability, like we have GStrv for char **:

GRefString *s = g_ref_string_new ("hello");

// ...

g_ref_string_release (s);

GRefString also has an autocleanup function, so you can do:

{
  g_autoptr(GRefString) s = g_ref_string_new ("hello!");

  // ...
}

and the string will automatically be released once it goes out of scope.

development glib memory-management

Follow me on Mastodon