halting problem :: Introspection’s edge

:: ~4 min read

tl;dr The introspection data for GLib (and its sub-libraries) is now generated by GLib itself instead of gobject-introspection; and in the near future, libgirepository is going to be part of GLib as well. Yes, there’s a circular dependency, but there are plans for dealing with that. Developers of language bindings should reach out to the GLib maintainers for future planning.

GLib-the-project is made of the following components:

  • glib, the base data types and C portability library
  • gmodule, a portable loadable module API
  • gobject, the object oriented run time type system
  • gio, a grab bag of I/O, IPC, file, network, settings, etc

Just like GLib, the gobject-introspection project is actually three components in a trench coat:

  • g-ir-scanner, the Python and flex based parser that generates the XML description of a GObject-based ABI
  • g-ir-compiler, the “compiler” that turns XML into an mmap()-able binary format
  • libgirepository, the C API for loading the binary format, resolving types and symbols, and calling into the underlying C library

Since gobject-introspection depends on GLib for its implementation, the introspection data for GLib has been, until now, generated during the gobject-introspection build. This has proven to be extremely messy:

  • as the introspection data is generated from the source code, gobject-introspection needs to have access to the GLib headers and source files
  • this means building against an installed copy of GLib for the declarations, and the GLib sources for the docblocks
  • to avoid having access to the sources at all times, there is a small script that extracts all the docblocks from a GLib checkout; this script has to be run manually, and its results committed to the gobject-introspection repository

This means that the gobject-introspection maintainers have to manually keep the GLib data in sync; that any change done to GLib’s annotations and documentation are updated typically at the end of the development cycle, which also means that any issue with the annotations is discovered almost always too late; and that any and all updates to the GLib introspection data require a gobject-introspection release to end up inside downstream distributions.

The gobject-introspection build can use GLib as a subproject, at the cost of some awful, awful build system messiness, but that does not help anybody working on GLib; and, of course, since gobject-introspection depends on GLib, we cannot build the former as a subproject of the latter.

As part of the push towards moving GLib’s API references towards the use of gi-docgen, GLib now checks for the availability of g-ir-scanner and g-ir-compiler on the installed system, and if they are present, they are used to generate the GLib, GObject, GModule, and GIO introspection data.

A good side benefit of having the actual project providing the ABI be in charge of generating its machine-readable description is that we can now decouple the release cycles of GLib and gobject-introspection; updates to the introspection data of GLib will be released alongside GLib, while the version of gobject-introspection can be updated whenever there’s new API inside libgirepository, instead of requiring a bump at every cycle to match GLib.

On the other hand, though, this change introduces a circular dependency between GLib and gobject-introspection; this is less than ideal, and though circular dependencies are a fact of life for whoever builds and packages complex software, it’d be better to avoid them; this means two options:

  • merging gobject-introspection into GLib
  • making gobject-introspection not depend on GLib

Sadly, the first option isn’t really workable, as it would mean making GLib depend on flex, bison, and CPython in order to build g-ir-scanner, or (worse) rewriting g-ir-scanner in C. There’s also the option of rewriting the C parser in pure Python, but that, too, isn’t exactly strife-free.

The second option is more convoluted, but it has the advantage of being realistically implemented:

  1. move libgirepository into GLib as a new sub-library
  2. move g-ir-compiler into GLib’s tools
  3. copy-paste the basic GLib data types used by the introspection scanner into gobject-introspection

Aside from breaking the circular dependency, this allows language bindings that depend on libgirepository’s API (Python, JavaScript, Perl, etc) to limit their dependency to GLib, since they don’t need to deal with the introspection parser bits; it also affords us the ability to clean up the libgirepository API, provide better documentation, and formalise the file formats.

Speaking of file formats, and to manage expectations: I don’t plan to change the binary data layout; yes, we’re close to saturation when it comes to padding bits and fields, but there hasn’t been a real case in which this has been problematic. The API, on the other hand, really needs to be cleaned up, because it got badly namespaced when we originally thought it could be moved into GLib and then the effort was abandoned. This means that libgirepository will be bumped to 2.0 (matching the rest of GLib’s sub-libraries), and will require some work in the language bindings that consume it. For that, bindings developers should get in touch with GLib maintainers, so we can plan the migration appropriately.

gobject glib introspection bindings development

Follow me on Mastodon