halting problem : Type instances

Since the start of the GTK4 development branch I've had to deal with creating fundamental types to replace ad hoc boxed types with inheritance three times; I thought about writing this stuff down, so the next time somebody thinks "I don't want to use GObject but I want a type hierarchy" they'll do something that doesn't make people using language bindings cry.

Let us assume we are writing a library.

The particular nature of our work is up for any amount of debate, but the basic fact of it comes with a few requirements, and they are by and large inevitable if you wish to be a well-behaved, well-integrated member of the GNOME community. One of which is: “please, think of the language bindings”. These days, luckily for all of us, this means writing introspectable interfaces that adhere to fairly sensible best practices and conventions.

One of the basic conventions has to do with types. By and large, types exposed by libraries fall into these two categories:

plain old data structures, which are represented by what’s called a “boxed” type; these are simple types with a copy and a free function, mostly meant for marshalling things around so that language bindings can implement properties, signal handlers, and abide to ownership transfer rules. Boxed types cannot have sub-types.
object types, used for everything else: properties, emitting signals, inheritance, interface implementation, the whole shebang.

Boxed and object types cover most of the functionality in a modern, GObject-based API, and people can consume the very same API from languages that are not C.

Except that there’s a third, kind of niche data type:

fully opaque, with instance fields only known within the scope of the project itself
immutable, or at least with low-mutability, after construction
reference counted, with optional cloning and serialization
derivable within the scope of the project, typically with a base abstract class
without signals or properties

Boxing

One strategy used to implement this niche type has been to use a boxed type, and then invent some private, ad hoc derivation technique, with some structure containing function pointers used as a vtable, for instance:

boxed-type.c[Lines 3-79]download

/* {{{ Base */
typedef struct _Base            Base;
typedef struct _BaseClass       BaseClass;

struct _BaseClass
{
  const char *type_name;
  gsize instance_size;

  void  (* finalize)    (Base *self);
  void  (* foo)         (Base *self);
  void  (* bar)         (Base *self);
}

struct _Base
{
  const BaseClass *base_class;

  // Shared field
  int some_field;
};

// Allocate the instance described in the vtable
static Base *
base_alloc (const BaseClass *vtable)
{
  // Simple check to ensure that the derived instance includes the
  // parent base type
  g_assert (vtable->instance_size >= sizeof (Base));

  // Use the atomic refcounted boxing to allocated the requested
  // instance size
  Base *res = g_atomic_rc_box_new (vtable->instance_size);

  // Store the vtable
  res->base_class = vtable;

  // Initialize the base instance fields
  res->some_field = 42;

  return res;
}

static void
base_finalize (Base *self)
{
  // Allow derived types to clear up their own instance data
  self->base_class->finalize (self);
}

Base *
base_ref (Base *self)
{
  return g_atomic_rc_box_acquire (self);
}

void
base_unref (Base *self)
{
  g_atomic_rc_box_release (self, base_finalize);
}

void
base_foo (Base *self)
{
  self->base_class->foo (self);
}

void
base_bar (Base *self)
{
  self->base_class->bar (self);
}

// Add a GType for the base type
G_DEFINE_BOXED_TYPE (Base, base, base_ref, base_unref)
/* }}} */

The code above lets us create derived types that conform to the base type API contract, while providing additional functionality; for instance:

boxed-type.c[Lines 81-123]download

/* {{{ DerivedA */
typedef struct {
  Base parent;

  char *some_other_field;
} DerivedA;

static void
derived_a_finalize (Base *base)
{
  DerivedA *self = (DerivedA *) base;

  g_free (self->some_other_field);
}

static const BaseClass derived_a_class = {
  .type_name = "DerivedA",
  .instance_size = sizeof (DerivedA),
  .finalize = derived_a_finalize,
  .foo = derived_a_foo, // defined elsewhere
  .bar = derived_a_bar, // defined elsewhere
};

Base *
derived_a_new (const char *some_other_field)
{
  Base *res = base_alloc (&derived_a_class);

  DerivedA *self = (DerivedA *) res;

  self->some_other_field = g_strdup (some_other_field);

  return res;
}

const char *
derived_a_get_some_other_field (Base *base)
{
  DerivedA *self = (DerivedA *) base;

  return self->some_other_field;
}
/* }}} */

Since the Base type is also a boxed type, it can be used for signal marshallers and GObject properties at zero cost.

This whole thing seems pretty efficient, and fairly simple to wrap your head around, but things fall apart pretty quickly as soon as you make this API public and tell people to use it from languages that are not C.

As I said above, boxed types cannot have sub-types; the type system has no idea that DerivedA implements the Base API contract. Additionally, since the whole introspection system is based on conventions applied on top of some C API, there is no way for language bindings to know that the derived_a_get_some_other_field() function is really a DerivedA method, meant to operate on DerivedA instances. Instead, you’ll only be able to access the method as a static function, like:

obj = Namespace.derived_a_new()
Namespace.derived_a_get_some_other_field(obj)

instead of the idiomatic, and natural:

obj = Namespace.DerivedA.new()
obj.get_some_other_field()

In short: please, don’t use boxed types for this, unless you’re planning to hide this functionality from the public API.

Typed instances

At this point the recommendation would be to switch to GObject for your type; make the type derivable in your project’s scope, avoid properties and signals, and you get fairly idiomatic code, and a bunch of other features, like weak references, toggle references, and keyed instance data. You can use your types for properties and signals, and you’re pretty much done.

But what if you don’t want to use GObject…

Well, in that case GLib lets you create your own type hierarchy, with its own rules, by using GTypeInstance as the base type.

GTypeInstance is the common ancestor for everything that is meant to be derivable; it’s the base type for GObject as well. Implementing a GTypeInstance-derived hierarchy doesn’t take much effort: it’s mostly low level glue code:

instance-type.c[Lines 4-189]download

typedef struct _Base            Base;
typedef struct _BaseClass       BaseClass;
typedef struct _BaseTypeInfo    BaseTypeInfo;

#define BASE_TYPE               (base_get_type())
#define BASE_GET_CLASS(obj)     (G_TYPE_INSTANCE_GET_CLASS ((obj), BASE_TYPE, 

// Simple macro that lets you chain up to the parent type's implementation
// of a virtual function, e.g.:
//
//   BASE_SUPER (self)->finalize (obj);
#define BASE_SUPER(obj)         ((BaseClass *) g_type_class_peek (g_type_parent (G_TYPE_FROM_INSTANCE (obj))))

struct _BaseClass
{
  GTypeClass parent_class;

  void  (* finalize)    (Base *self);
  void  (* foo)         (Base *self);
  void  (* bar)         (Base *self);
}

struct _Base
{
  GTypeInstance parent_instance;

  gatomicrefcount ref_count;

  // Shared field
  int some_field;
};

// A structure to be filled out by derived types when registering
// themselves into the type system; it copies the vtable into the
// class structure, and defines the size of the instance
struct _BaseTypeInfo
{
  gsize instance_size;

  void  (* finalize)    (Base *self);
  void  (* foo)         (Base *self);
  void  (* bar)         (Base *self);
};

// GValue table, so that you can initialize, compare, and clear
// your type inside a GValue, as well as collect/copy it when
// dealing with variadic arguments
static void
value_base_init (GValue *value)
{
  value->data[0].v_pointer = NULL;
}

static void
value_base_free_value (GValue *value)
{
  if (value->data[0].v_pointer != NULL)
    base_unref (value->data[0].v_pointer);
}

static void
value_base_copy_value (const GValue *src,
                       GValue       *dst)
{
  if (src->data[0].v_pointer != NULL)
    dst->data[0].v_pointer = base_ref (src->data[0].v_pointer);
  else
    dst->data[0].v_pointer = NULL;
}

static gpointer
value_expression_peek_pointer (const GValue *value)
{
  return value->data[0].v_pointer;
}

static char *
value_base_collect_value (GValue      *value,
                          guint        n_collect_values,
                          GTypeCValue *collect_values,
                          guint        collect_flags)
{
  Base *base = collect_values[0].v_pointer;

  if (base == NULL)
    {
      value->data[0].v_pointer = NULL;
      return NULL;
    }

  if (base->parent_instance.g_class == NULL)
    return g_strconcat ("invalid unclassed Base pointer for "
                        "value type '",
                        G_VALUE_TYPE_NAME (value),
                        "'",
                        NULL);

  value->data[0].v_pointer = base_ref (base);

  return NULL;
}

static gchar *
value_base_lcopy_value (const GValue *value,
                        guint         n_collect_values,
                        GTypeCValue  *collect_values,
                        guint         collect_flags)
{
  Base **base_p = collect_values[0].v_pointer;

  if (base_p == NULL)
    return g_strconcat ("value location for '",
                        G_VALUE_TYPE_NAME (value),
                        "' passed as NULL",
                        NULL);

  if (value->data[0].v_pointer == NULL)
    *base_p = NULL;
  else if (collect_flags & G_VALUE_NOCOPY_CONTENTS)
    *base_p = value->data[0].v_pointer;
  else
    *base_p = base_ref (value->data[0].v_pointer);

  return NULL;
}

// Register the Base type
GType
base_get_type (void)
{
  static volatile gsize base_type__volatile;

  if (g_once_init_enter (&base_type__volatile))
    {
      // This is a derivable type; we also want to allow
      // its derived types to be derivable
      static const GTypeFundamentalInfo finfo = {
        (G_TYPE_FLAG_CLASSED |
         G_TYPE_FLAG_INSTANTIATABLE |
         G_TYPE_FLAG_DERIVABLE |
         G_TYPE_FLAG_DEEP_DERIVABLE),
      };

      // The gunk for dealing with GValue
      static const GTypeValueTable value_table = {
        value_base_init,
        value_base_free_value,
        value_base_copy_value,
        value_base_peek_pointer,
        "p",
        value_base_collect_value,
        "p",
        value_base_lcopy_value,
      };

      // Base type information
      const GTypeInfo base_info = {
        // Class
        sizeof (GtkExpressionClass),
        (GBaseInitFunc) NULL,
        (GBaseFinalizeFunc) NULL,
        (GClassInitFunc) base_class_init,
        (GClassFinalizeFunc) NULL,
        NULL,

        // Instance
        sizeof (GtkExpression),
        0,
        (GInstanceInitFunc) base_init,

        // GValue
        &value_table,
      };

      // Register the Base type as a new, abstract fundamental type
      GType base_type =
        g_type_register_fundamental (g_type_fundamental_next (),
                                     g_intern_static_string ("Base"),
                                     &event_info, &finfo,
                                     G_TYPE_FLAG_ABSTRACT);

      g_once_init_leave (&base_type__volatile, expression_type);
    }

  return base_type__volatile;
}

Yes, this is a lot of code.

The base code stays pretty much the same:

instance-type.c[Lines 191-243]download

static void
base_real_finalize (Base *self)
{
  g_type_free_instance ((GTypeInstance *) self);
}

static void
base_class_init (BaseClass *klass)
{
  klass->finalize = base_real_finalize;
}

static void
base_init (Base *self)
{
  // Initialize the base instance fields
  g_atomic_ref_count_init (&res->ref_count);
  res->some_field = 42;
}

static Base *
base_alloc (GType type)
{
  g_assert (g_type_is_a (type, base_get_type());

  // Instantiate a new type derived by Base
  return (Base *) g_type_create_instance (type);
}

Base *
base_ref (Base *self)
{
  g_atomic_ref_count_inc (&self->ref_count);
}

void
base_unref (Base *self)
{
  if (g_atomic_ref_count_dec (&self->ref_count))
    BASE_GET_CLASS (self)->finalize (self);
}

void
base_foo (Base *self)
{
  BASE_GET_CLASS (self)->foo (self);
}

void
base_bar (Base *self)
{
  BASE_GET_CLASS (self)->bar (self);
}

except:

the reference counting is explicit, as we must use g_type_create_instance() and g_type_free_instance() to allocate and free the memory associated to the instance
you need to get the class structure from the instance using the GType macros instead of direct pointer access

Finally, you will need to add code to let you register derived types; since we want to tightly control the derivation, we use an ad hoc structure for the virtual functions, and we use a generic class initialization function:

instance-type.c[Lines 245-290]download

static void
base_generic_class_init (gpointer g_class,
                         gpointer class_data)
{
  BaseTypeInfo *info = class_data;
  BaseClass *klass = g_class;

  klass->finalize = info->finalize;
  klass->foo = info->foo;
  klass->bar = info->bar;

  // The info structure was copied, so we now need
  // to release the resources associated with it
  g_free (class_data);
}

// Register a derived typed of Base
static GType
base_type_register_static (const char         *type_name,
                           const BaseTypeInfo *type_info)
{
  // Simple check to ensure that the derived instance includes the
  // parent base type
  g_assert (type_info->instance_size >= sizeof (Base));

  GTypeInfo type_info;

  // All derived types have the same class and cannot add new virtual
  // functions
  type_info.class_size = sizeof (BaseClass);
  type_info.base_init = NULL;
  type_info.base_finalize = NULL;

  // Fill out the class vtable from the BaseTypeInfo structure
  type_info.class_init = base_generic_class_init;
  type_info.class_finalize = NULL;
  type_info.class_data = g_memdup (type_info, sizeof (BaseTypeInfo));

  // Instance information
  type_info.instance_size = type_info->instance_size;
  type_info.n_preallocs = 0;
  type_info.instance_init = NULL;
  type_info.value_table = NULL;

  return g_type_register_static (BASE_TYPE, type_name, &type_info, 0);
}

Otherwise, you could re-use the G_DEFINE_TYPE macro—yes, it does not require GObject—but then you’d have to implement your own class initialization and instance initialization functions.

After you defined the base type, you can structure your types in the same way as the boxed type code:

instance-type.c[Lines 294-347]download

typedef struct {
  Base parent;

  char *some_other_field;
} DerivedA;

static void
derived_a_finalize (Base *base)
{
  DerivedA *self = (DerivedA *) base;

  g_free (self->some_other_field);

  // We need to chain up to the parent's finalize() or we're
  // going to leak the instance
  BASE_SUPER (self)->finalize (base);
}

static const BaseTypeInfo derived_a_info = {
  .instance_size = sizeof (DerivedA),
  .finalize = derived_a_finalize,
  .foo = derived_a_foo, // defined elsewhere
  .bar = derived_a_bar, // defined elsewhere
};

GType
derived_a_get_type (void)
{
  static volatile gsize derived_type__volatile;

  if (g_once_init_enter (&derived_type__volatile))
    {
      // Register the type
      GType derived_type =
        base_type_register_static (g_intern_static_string ("DerivedA"),
                                   &derived_a_info);

      g_once_init_leave (&derived_type__volatile, derived_type);
    }

  return derived_type__volatile;
}

Base *
derived_a_new (const char *some_other_field)
{
  Base *res = base_alloc (derived_a_get_type ());

  DerivedA *self = (DerivedA *) res;

  self->some_other_field = g_strdup (some_other_field);

  return res;
}

The nice bit is that you can tell the introspection scanner how to deal with each derived type through annotations, and keep the API simple to use in C while idiomatic to use in other languages:

instance-type.c[Lines 349-363]download

/**
 * derived_a_get_some_other_field:
 * @base: (type DerivedA): a derived #Base instance
 *
 * Retrieves the `some_other_field` of a derived #Base instance.
 *
 * Returns: (transfer none): the contents of the field
 */
const char *
derived_a_get_some_other_field (Base *base)
{
  DerivedA *self = (DerivedA *) base;

  return self->some_other_field;
}

Cost-benefit

Of course, there are costs to this approach. In no particular order:

The type system boilerplate is a lot; the code size more than doubled from the boxed type approach. This is quite annoying, but at least it is a one-off cost, and you won’t likely ever need to change it. It would be nice to have it hidden by some magic macro incantation, but it’s understandably hard to do so without imposing restrictions on the kind of types you can create; since you’re trying to escape the restrictions of GObject, it would not make sense to impose a different set of restrictions.
If you want to be able to use this new type with properties and you cannot use G_TYPE_POINTER as a generic, hands-off container, you will need to derive GParamSpec, and add ad hoc API for GValue, which is even more annoying boilerplate. I’ve skipped it in the example, because that would add about 100 more lines of code.
Generated signal marshallers, and the generic one using libffi, do not know how to marshal typed instances; you will need custom written marshallers, or you’re going to use G_TYPE_POINTER everywhere and assume the risk of untyped boxing. The same applies to anything that uses the type system to perform things like serialization and deserialization, or GValue boxing and unboxing. You decided to build your own theme park on the Moon, and the type system has no idea how to represent it, or access its functionality.
Language bindings need to be able to deal with GTypeInstance and fundamental types; this is not always immediately necessary, so some maintainers do not add the code to handle this aspect of the type system.

The benefit is, of course, the fact that you are using a separate type hierarchy, and you get to make your own rules on things like memory management, lifetimes, and ownership. You can control the inheritance chain, and the rules on the overridable virtual functions. Since you control the whole type, you can add things like serialization and deserialization, or instance cloning, right at the top of the hierarchy. You could even implement properties without using GParamSpec.

Conclusion

Please, please use GObject. Writing type system code is already boring and error prone, which is why we added a ton of macros to avoid people shooting themselves in both their feet, and we hammered away all the special snowflake API flourishes that made parsing C API to generate introspection data impossible.

I can only recommend you go down the GTypeInstance route if you’ve done your due diligence on what that entails, and are aware that it is a last resort if GObject simply does not work within your project’s constraints.

development glib gobject

Boxing

Typed instances

Cost-benefit

Conclusion

related posts