halting problem :: Episode 1.b: Bindings

:: ~5 min read

A side episode! Language bindings in the early GNOME era take the front stage

Back in episode 1.1 and 1.2 we talked a bit about “language bindings” as a resource available to GNOME application developers that did not want to deal with C as the programming language for their projects.

Language bindings for GTK appeared pretty much as soon as the GIMP adopted it, in order to write plugins for custom image processing effects; adding new filters just by dropping a file in a well-known location made GIMP extensible without requiring recompiling the whole application. So it’s not weird that the original announcement for the GNOME project mentions applications and desktop components written in Guile: since the very beginning, the intent was to only use C for core libraries of the GNOME platform, in order to have an application development stack consumable through programming languages other than C.

Of course, best laid plans, and all that.

If we look at the history of the bindings in GNOME we can divide it into two major eras: before introspection, and after introspection. We’re going to concentrate on the former, and leave the latter when we move on to the second chapter of the main narrative.

The first few bindings for GTK appearing on the scene were Objective C, C++, and Guile. The first two were definitely easier to achieve, as both Objective C and C++ are compatible with the C standard of the time; it was easy to build shallow abstractions over the C API, and if you needed something more complicated, you could always drop into C. Guile, on the other hand, used a LISP grammar; and even though it managed to call into C exactly the same, you could not really expose a pointer to a C data structure and tell developers to go to town on it, so you had to be careful in both what you exposed, and what you abstracted away.

After Objective C, C++, and Guile, Perl and Python bindings started to appear, as the market share for both those languages increased.

The shared problem among all of them was that GTK was growing as a toolkit, and its API surface started to exceed the capacity for a human being to hand code the various entry points by reading a C header file and translating it into the intended code for each programming language to trampoline into the underlying library.

It is a well established fact that programmers, left to their own devices, will try to program their way out of every problem. In this particular case, each binding started growing its own set of tools to generate code from C headers; then the script were modified to read C headers and ancillary metadata, needed to handle cases where C was sadly not enough to describe the API and the relation between widgets, like inheritance; or the idiosynchrasies of the GTK code base, like constructor arguments (the precursor to GObject’s properties) and signals.

The C++ bindings grew a veritable cornucopia of exceedingly 1997 decisions, like:

  • a set of hand coded header files that included parsing directives and metadata, alongside real C++ code
  • a set of m4 macros to do text processing over the generated files to replace some of the directives
  • a lexer and a parser, generated via flex and bison, to generate C++ code out of the generated headers

I looked at the early C++ bindings and I’m still in awe of the Rube Goldberg machine that was used to build them; the only things missing in the project build were a small marble, an hamster wheel, two pool cues, and a bowling bowl.

Every binding, though, had their own way of generating code, and thus their own way of storing metadata to describe the GTK API. To avoid the proliferation of ad hoc formats, and to provide a relatively official description of the API, GTK developers decided to settle to a common definition file, mutuated from the Guile bindings.

The definition file used an S-expression syntax, and it described:

  • enumeration types, which contained not only the C symbol, but also a “nick name”, that is a string that could be used to map it to the numeric value
  • boxed types, that is plain old data structures
  • object types, that is classes in the type system
  • functions, each with its return type and arguments

The definition files were originally a mix of hand written changes on top of the output generated by a script that would parse the C header files; this meant that they would go out of sync pretty quickly. Additionally, they were lacking a lot of ancillary information because the script did not know anything about the GTK type system. Various language bindings took the definition files and copied them into their own source repositories, to tweak them to fit their code generation steps; this introduced an additional drift into the already cumbersome process of keeping language bindings up to date with a fast moving library like GTK.

An attempt at improving and standardising the definition file format came in the late 1999 and early 2000, courtesy of Elliot Lee and Havoc Pennington. The s-expression syntax was maintained, but the data set was extended to allow matching objects with namespaces; methods with classes; and parameters with their types, names, and default values.

Of course, this still required generating C code out of a formally agnostic S-expression, generated from C code; this meant that bindings for dynamic languages were all but dynamic, and that additions to the GTK API would still require an additional step in order to be consumed by anything that wasn’t C, or C adjacent. For a while, though, this was good enough for application developers, and the ability to quickly write small tools or large applications, in Python, or Perl, or PHP, or Ruby, made the GNOME platform quite attractive.

Still, many applications in the GNOME project were written in C because that’s what C developers will do when given then chance — and that’s what we’re going to see in next week’s side episode.

References

history of gnome gnome podcast

Follow me on Mastodon