(* callback)(void * callback_data, ...)
How do write a correct C callback interface. 

In a chapter about bluffing in poker, the authors impressed me by writing about "bluffing correctly." They didn't mean bluffing with gusto or enough acting ability to be successful; they simply had calculated the game-theoretically optimal frequency for bluffing in a situation, and if one bluffed less or more, one was exposing a weakness.

I liked this idea that something as guilt-ladden as deception could be susceptible to simple, mathematical correctness, and I'm borrowing the term here to use it as a metaphor for good design: there's a technical "gotcha" when writing callback interfaces that makes implementations that follow it so clearly superior to those that don't that it's almost like a mathematical tautology.

A callback interface transfers control to a function at a time determined by some other piece of code. For example, a traversal interface of some data structure might call a callback on each of the structure's elements; or an event handler might call a callback if a certain user interface event happens.

This can be both confusing and powerful, because it can be used to invert semantic hierarchy in the call hierarchy. In normal, procedural code, the calling function knows more than the functions it calls. Like a micromanaging leader, procedural code usually gives direct orders: "do this, do that, now do this". Callbacks go away from that towards more situational, flexible interfaces: "call me if something happens," "report back once this is finished."

In order to do callbacks correctly, two things are necessary:

  • a function pointer
  • and a generic data pointer that is passed to the function pointer during the callback execution.

I don't care what the names are, but the code that registers a callback must be allowed to store both a data pointer and a function pointer.

Interfaces without function pointers

There are some interfaces where functions are called based on their name, not a previously registered value. For example, all C library functions are called by their names. The code generated by Yacc calls a function "yylex" to parse tokens.

As a consequence, it's difficult to call different versions of the same library in a program (imagine database access libraries operating on different databases whose formats differ), and it's difficult to have two yacc-generated parsers in the same program. (Imagine a program that can read two different document types for which two different libraries have been implemented. Imagine linking a program against two libraries that both contain application-specific yylex() functions to read their separate tokens.)

Interfaces without data pointers

If a callback doesn't get a data pointer passed in, it usually get its data from global variables. That means that it becomes non-reentrant. Only one thread can use the code that employs the callbacks at a time. Non-reentrance is transitive up the abstraction chain: all code that uses non-reentrant code is in turn non-reentrant, and so on.

The callback used in qsort() doesn't pass in a contextual data pointer; all the code has to go on are the elements in the array that is being qsorted. That means that if there is context information to be used (for example, if strings are compared in a language-dependent way and one needs to keep track of the language involved), an always-identical context pointer needs to be part of every object that is being compared.

The callback used in bsearch() doesn't pass in a contextual data pointer either. Same problem.

If you write interfaces that have callback functions, pass in a data pointer. You may think that what you're exporting is natively, inherently non-reentrant, but you'll be wrong.

Yes, but I can use thread-local storage.

Sure, if you want to make something that has nothing to do with multithreading depend on the presence of a certain thread interface, you can do that; that, a stack you manage (for the situations where the called code recursively calls the bad implementation again), a few locks (in case the bad code is interrupted by external events) and some deadlock detection (in case it is interrupted and restarted) will almost get you where you'd be without those cumbersome workarounds.

If you have a choice, don't do that.

Data pointers in objects

In more interesting callback applications, more than just the callback data pointer is passed to a callback - it also gets passed an object of some sort, or detail information about an event. Frequently, the objects passed in can be annotated with an application-controlled data pointer that can take the place of a per-callback data pointer.

Whether this is a good idea depends on whether the callback is part of implementing the objects or whether it is part of using the objects. If it is part of the use, there can be different users. There is no one single "application" module that can safely control the one "application pointer" that the model keeps track of. Usually, you're better off having the data pointer be part of the callback.

Multiple related callbacks with the same data

Used right, this is the C-fake-object-orientation version of having an interface definition consisting of data and multiple methods, and works well. The key is to have a fine enough granularity of object and callbacks; truly consider that pointer to be part of the callback group.

 
keywords:
C programming
style

Dec 17, 2004,
jutta@pobox.com

<- rants