These articles are written by Codalogic empowerees as a way of sharing knowledge with the programming community. They do not necessarily reflect the opinions of Codalogic.

The QEMU Object Model (QOM) for C++ Programmers

By: Pete, October 2022

QEMU is an emulator that allows programs written for one microprocessor to be emulated and run on a host running a different microporcessor.

Central to the working of QEMU is the QEMU Object Model or QOM. Every device is represented in the QOM in an object-oriented way.

Because C doesn't naturally represent object oriented systems there's a lot of moving parts in the QEMU C implementation to make this work. As a result it can get a bit confusing.

Mapping the QOM to a regular C++ object model may help.

In C++ we may have an object model structure like the following (I've used struct instead of class because C doesn't have the concept of public and private):

struct Base
{
    static int type;
    int id;
    Base() { /* construct me */ }
};

struct Derived
{
    static int v;
    int x;
    Derived() { /* construct me */
    virtual void do_operation_1() {}
    void do_operation_2() {}
};

To represent this in C QOM uses two parallel class hierarchies that split the features.

The first has the Object struct (See here on Gitlab) as the base of all its derivatives. Types will derive from this struct. There might be multiple instances of a given type derived from the Object struct.

struct Object
{
    ObjectClass *class;
    ...
};

The second has the ObjectClass struct (See here on Gitlab) as the base of all its derivatives. Types will also derive from this but there will only ever be one instance of the structs derived from this struct.

struct ObjectClass
{
    Type type;
    ...
};

Objects deriving from Object will contain per-instance information and objects deriving from ObjectClass will have per-class information.

Notice that the Object struct has a pointer to its corresponding ObjectClass struct.

For example, the SysBusDevice parallel hierarchies are below.

The structs deriving from ObjectClass are DeviceClass (On Gitlab here) and then SysBusDeviceClass (On Gitlab here), which have the form:

struct DeviceClass() {
    ObjectClass parent_class;
    ...
};

struct SysBusDeviceClass {
    DeviceClass parent_class;
    ...
};

And the structs deriving from Object are DeviceState (On Gitlab here) and then SysBusDevice (On Gitlab here), which have the form:

struct DeviceState {
    Object parent_obj;
    ...
};

struct SysBusDevice {
    DeviceState parent_obj;
    ...
};

Referring to our C++ class above, per-class information equivalent to Base::type and Derived::v would end up in the hierarchy derived from ObjectClass.

Per-instance data equivalent to Base.id and Derived.x would be in the Object struct hierachy.

Additionally the methods Derived::do_operation_1() and Derived::do_operation_2() would be represented as pointers to functions in the ObjectClass struct hierachy because only one set of pointers are required per type irrespective of how many instances of the type there are.

A big difference between C and C++ is that if we have a pointer to a derived type, say d, then to access an entity in the base class we can simply do d->id. This doesn't work in C. If we have a pointer to a SysBusDeviceClass struct called s and we want to access the type variable in ObjectClass we would have to do s->parent_obj->parent_obj->type. This isn't particularly appealing especially as the naming of the base struct isn't always consistent.

Instead, if a function, (which is typically called with a pointer to the base Object or ObjectClass), needs to access data in multiple structs within the hierachy it will create multiple pointers and cast each to be a pointer to the respective type.

So if a function wants to manipulate data in both DeviceClass and SysBusDeviceClass it will create two pointers like DeviceClass *dc; and SysBusDeviceClass *sbdc;. Both pointers will have the same value but they will have different types associated with them and so they will be able to access the relevant members of the different structs.

However, casting pointers from ObjectClass directly to DeviceClass and SysBusDeviceClass is obviously dangerous and error-prone.

Instead, when creating the parallel struct hierachy, helper macros are created that can cast from, say, Object to SysBusDevice and from ObjectClass to SysBusDeviceClass. For SysBusDevice the macros you would create would be SYS_BUS_DEVICE, SYS_BUS_DEVICE_GET_CLASS and SYS_BUS_DEVICE_CLASS.

Hence in a function you could see code like:

void instance_handler(Object *obj)
{
    DeviceState *ds = DEVICE(obj);              // Derive instance from base object
    SysBusDevice *sbd = SYS_BUS_DEVICE(obj);    // Derive instance from base object

    SysBusDeviceClass *sbdc = SYS_BUS_DEVICE_GET_CLASS(obj);   // Class from base object
}

void class_handler(ObjectClass *klass)
{
    DeviceClass *dc = DEVICE_CLASS(klass);
    SysBusDeviceClass *sbdc = SYS_BUS_DEVICE_CLASS(klass);
}

Fortunately, QEMU provides macros to make creating these helper macros easier.

To create the set of macros for DeviceState and DeviceClass (See here) you would do:

#define TYPE_DEVICE "device"
OBJECT_DECLARE_TYPE(DeviceState, DeviceClass, DEVICE)

And for the SysBusDevice and SysBusDeviceClass set (See here) you would do:

#define TYPE_SYS_BUS_DEVICE "sys-bus-device"
OBJECT_DECLARE_TYPE(SysBusDevice, SysBusDeviceClass, SYS_BUS_DEVICE)

The OBJECT_DECLARE_TYPE macro (See here) is defined as follows:

#define OBJECT_DECLARE_TYPE(InstanceType, ClassType, MODULE_OBJ_NAME) \
    typedef struct InstanceType InstanceType; \
    typedef struct ClassType ClassType; \
    \
    G_DEFINE_AUTOPTR_CLEANUP_FUNC(InstanceType, object_unref) \
    \
    DECLARE_OBJ_CHECKERS(InstanceType, ClassType, \
                         MODULE_OBJ_NAME, TYPE_##MODULE_OBJ_NAME)

QEMU has layers and layers of macros. If I re-write OBJECT_DECLARE_TYPE to expand the child macros and simplify the code a bit, the result is as follows:

#define OBJECT_DECLARE_TYPE(InstanceType, ClassType, MODULE_OBJ_NAME) \
    typedef struct InstanceType InstanceType; \
    typedef struct ClassType ClassType; \
    \
    G_DEFINE_AUTOPTR_CLEANUP_FUNC(InstanceType, object_unref) \
    \
    InstanceType * MODULE_OBJ_NAME(const void *obj) \
    { return (InstanceType *)object_dynamic_cast_assert((Object *)(obj), TYPE_##MODULE_OBJ_NAME, __FILE__, __LINE__, __func__); } \
    \
    ClassType * MODULE_OBJ_NAME##_GET_CLASS(const void *obj) \
    { return (ClassType *)object_class_dynamic_cast_assert(object_get_class((Object *)(obj)), TYPE_##MODULE_OBJ_NAME, __FILE__, __LINE__, __func__); } \
    \
    ClassType * MODULE_OBJ_NAME##_CLASS(const void *klass) \
    { return (ClassType *)object_class_dynamic_cast_assert((ObjectClass *)(klass), TYPE_##MODULE_OBJ_NAME, __FILE__, __LINE__, __func__); }

The typedefs are a convenient place to save you having to do struct InstanceType etc.

The object_dynamic_cast_assert() and object_class_dynamic_cast_assert() functions will look through the stored object hierarchy bookkeeping information and assess whether the pointed to object is of the right type. If it is the functions will return and the return value cast to the correct struct type. If not an assert() will be invoked.

We've now seen how the parallel struct hierarchy stores the per-instance and per-class information that can be represented in a C++ class. We've also seen how to move up and down the class hierarchy in a safe way.

The missing piece that our C++ offers but we haven't discussed yet is how the equivalent of C++ constructors is implemented.

To tell QOM how to create objects we need to create a static instance of TypeInfo for the type and then register it.

Typeinfo (See here) looks like this:

struct TypeInfo
{
    const char *name;
    const char *parent;

    size_t instance_size;
    size_t instance_align;
    void (*instance_init)(Object *obj);
    void (*instance_post_init)(Object *obj);
    void (*instance_finalize)(Object *obj);

    bool abstract;
    size_t class_size;

    void (*class_init)(ObjectClass *klass, void *data);
    void (*class_base_init)(ObjectClass *klass, void *data);
    void *class_data;

    InterfaceInfo *interfaces;
};

The fields in this are well documented in the source code, but you can readily see that, in addition to specifying the name and parent name, there are entries to specify the size of a per-instance object and the class object. There are also pointers to functions to initialise the two different kinds of object.

Not all of these fields have to be set when creating an instance of TypeInfo for your type. The code relies heavily on C setting values that are not explicitly initialised to 0. A 0 value indicates a default. If, say, instance_size is not initialised then the size of the parent's instance_size will be used to allocate memory for the object.

The TypeInfo instance for SysBusDevice (See here) is:

static const TypeInfo sysbus_device_type_info = {
    .name = TYPE_SYS_BUS_DEVICE,
    .parent = TYPE_DEVICE,
    .instance_size = sizeof(SysBusDevice),
    .abstract = true,
    .class_size = sizeof(SysBusDeviceClass),
    .class_init = sysbus_device_class_init,
};

As you can see, not all the fields are initialised.

To register this type with QOM the following code (See here) is run:

static void sysbus_register_types(void)
{
    ...
    type_register_static(&sysbus_device_type_info);
}

type_init(sysbus_register_types)

type_init (See here) is a macro that expands as follows:

#define type_init(function) module_init(function, MODULE_INIT_QOM)

#define module_init(function, type)                                         \
static void __attribute__((constructor)) do_qemu_init_ ## function(void)    \
{                                                                           \
    register_module_init(function, type);                                   \
}

The __attribute__((constructor)) attribute on the generated function tells GCC to call the function before main() is called, thus calling sysbus_register_types(void) and registering the instance of TypeInfo using the QOM function type_register_static.

Instances of our defined types are then created using object_new (See here) or some more specialised functions that ultimately calls object_new. SysBusDevice has such specialised function for creating instances of it so you would not use object_new to creating instances of it in this case. However, if we had created a SysBusDevice-like object we could create instances of it using something like:

MySysBusDevice *sbd = MY_SYS_BUS_DEVICE(object_new(TYPE_MY_SYS_BUS_DEVICE));

That's a long introduction to the QEMU Object Model. Sadly I think the detail makes the model look more complex and scary than it is.

We have two parallel object hierarchies corresponding to the per-instance and per-class elements that you would find in a typical C++ class. You generate helper macros to help moving up and down the two object hierachies is type safe ways (e.g. SYS_BUS_DEVICE, SYS_BUS_DEVICE_GET_CLASS and SYS_BUS_DEVICE_CLASS) using the OBJECT_DECLARE_TYPE macro. You create a static instance of TypeInfo and then register it to tell QOM how to create instances of the objects. Finally you use object_new or similar to create instances of the objects, giving it the name of the object you want to create.

As an exercise to the reader you can see the QOM initialisation for the Arm Aarch64 processor here.

Further tutorials on QOM are available here and here.

Articles

February 2023