Implementing single dispatch polymorphism

Many object-oriented languages have support for subtype polymorphism with single dispatch over the this argument.

To implement polymorphic calls, they need a way of accessing information about which override should be selected at runtime. At least for compiled languages, the most obvious way to store this information is in the form of function pointers. The structure that contains these function pointers is called a dispatch table.

Memory layout overview

The dispatch table could be embedded in each instance. But because instances of the same runtime type will dispatch to the same overrides, we only need one dispatch table per type, not per instance. Instances can just point to their type’s dispatch table and that pointer is assigned at initialization time.

For example, consider these types:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


class A
{
    int X;
    char[] Y;
    public virtual string M() => "A";
}

class B : A
{
    int Z;
    public override string M() => "B";
}

A reasonable memory layout for them could look like this:

Memory layout of A:

offset	description
0x0	pointer to dispatch table of `A`
0x4	32-bit integer
0x8	reference to `char[]`

Memory layout of B:

offset	description
0x0	pointer to dispatch table of `B`
0x4	32-bit integer
0x8	reference to `char[]`
0xC	32-bit integer

The memory layout of the dispatch table can contain a function pointer per virtual method that this type is involved in. The actual value of each function pointer is the address of the correct implementation of the corresponding method for instances of that type.

Let’s consider this hierarchy as an example:

1
2
3
4
5


class A { }
class B { virtual void M() { } }
class C : B { override void M() { } }
class D : C { }
class E : D { override void M() { } virtual void N() { } }

To tell each M() apart from the others, let’s dub them based on the type they’re declared/overridden in: B::M(), C::M(), and so on. Assuming they don’t inherit any other virtual methods, their dispatch tables could reasonably look something like this:

Dispatch table for A:

offset	value
-	-

Dispatch table for B:

offset	value
0x0	`&B::M()`

Dispatch table for C:

offset	value
0x0	`&C::M()`

Dispatch table for D:

offset	value
0x0	`&C::M()`

Dispatch table for E:

offset	value
0x0	`&E::M()`
0x4	`&E::N()`

At the call spot, we can use the pointer hidden inside every instance to arrive at the dispatch table of the runtime type of that instance. Inside the dispatch table, we can go to the offset that corresponds to our virtual method (in this case, 0x0 for M() and 0x4 for N()) and we can obtain the function pointer found at that offset. Afterwards, the regular calling procedure can be followed.

Ordering types

In type systems with single implementation inheritance (we’ll be ignoring interface inheritance here), types are laid out in tree form. If there is a type that acts as a common supertype to all other types (e.g. object), then we have a unified hierarchy and therefore we are looking at one big tree. If no such type exists, not much changes — we’re simply looking at multiple, smaller trees.

When building a dispatch table for a type, it’s convenient to have a chain that goes from the most general types to the most specific ones. This can easily be done with a breadth-first traversal of the tree.

For example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


[A]
 |
 +------------[B]
 |             |
 |             +------[C]
 |             |
 |             +------[D]
 |             |       |
 |             |       +----[E]
 |             |       |
 |             |       +----[F]
 |             |
 |             +------[G]
 |
 +------------[H]
 |             |
 |             +------[I]
 |
 +------------[K]
 |
 +----...
 |
...

A breadth-first traversal of the above tree gives us this ordering:

$$ (A) → (B → H → K) → (C → D → G → I) → (E → F) $$

Notice that the types inside each pair of parentheses are arbitrarily ordered. This means that any reordering of the types within each pair of parentheses gives us an ordering that’s just as valid as the others. The validity criterion for any ordering is that a type shall not appear to the left of an obviously more general type or to the right of an obviously more special type.

Building the tables

For our type tree (each of them, if there are multiple), we can now apply a simple algorithm in order to build the dispatch table.

Start from the top type (the most general type of the tree). Create a list of all of its virtual methods. Map these virtual methods to their corresponding function pointers and create a new list. This list is the dispatch table for this type — all we have to do is to lay out its elements sequentially.

As a case study, we will take the previously presented hierarchy and add some random virtual methods. For example:

1

class A { virtual void X() { } }


Virtual methods	`X()`
Function pointers	`&A::X()`

Now go to the next type in the ordering. Unlike before, we now don’t want to assume that all virtual methods are new. Some are new and some are inherited from the parent. Of those that are inherited, some will be overridden in the current class and some will be left the same in the parent.

Create a list of the virtual methods appearing in the current type and then break that list in two: a list of those virtual methods that also appear in the parent (the overrides) and a list of those that do not appear in the parent (the new ones).

Take a copy of the parent’s function pointer list. For each element in our list of overrides, change the corresponding function pointer to point to our override (if any). This now-modified copy is the updated function pointer list for this type. We can imagine that in the special case where no overrides are present in this type, this copied list will remain unmodified.

Next, take the list of the new virtual functions and map them to their function pointer counterparts and create a new function pointer list.

Finally, concatenate the inherited virtual methods list with the new virtual methods list. Also, concatenate the modified copy of the inherited function pointer list with the new function pointer list.

Let’s apply the above algorithm to the following example:

1
2
3
4
5
6
7
8
9


class A { virtual void X() { } }
class B : A { }
class C : B { override void X() { } }
class D : B { override void X() { } }
class E : D { virtual void Y() { } }
class F : D { virtual void Z() { } }
class G : A { override void X() { } virtual void Y() { } }
class H : G { override void Y() { } }
class I : A { virtual void Y() { } virtual void Z() { } }

Class	Inherited virtual methods	Inherited function pointers	Overridden virtual methods	Updated function pointers	New virtual methods	New function pointers	Total virtual methods	Total function pointers
A							`X()`	`&A::X()`
B	`X()`	`&Α::X()`		`&A::X()`			`X()`	`&A::X()`
C	`X()`	`&A::X()`	`X()`	`&C::X()`			`X()`	`&C::X()`
D	`X()`	`&A::X()`	`X()`	`&D::X()`			`X()`	`&D::X()`
E	`X()`	`&D::X()`		`&D::X()`	`Y()`	`&E::Y()`	`X()`, `Y()`	`&D::X()`, `&E::Y()`
F	`X()`	`&D::X()`		`&D::X()`	`Z()`	`&F::Z()`	`X()`, `Z()`	`&D::X()`, `&F::Z()`
G	`X()`	`&A::X()`	`X()`	`&G::X()`	`Y()`	`&G::Y()`	`X()`, `Y()`	`&G::Y()`, `&G::Y()`
H	`X()`, `Y()`	`&G::X()`, `&G::Y()`	`Y()`	`&G::X()`, `&H::Y()`			`X()`, `Y()`	`&G::X()`, `&H::Y()`
I	`X()`	`&A::X()`		`&A::X()`	`Y()`, `Z()`	`&I::Y()`, `&I::Z()`	`X()`, `Y()`, `Z()`	`&A::X()`, `&I::Y()`, `&I::Z()`

The actual dispatch table that makes it into the runtime of the final program is simply the last column (i.e. the list of function pointers) laid out in memory sequentially (in ascending offsets).

For example, the dispatch table for I would look something like this:

offset	value
0x0	`&A::X()`
0x4	`&I::Y()`
0x8	`&I::Z()`

Abstract classes and methods

Abstract classes cannot be instantiated. Thus, in a correctly written program, the dispatch table of an abstract class will never be accessed at runtime.

Does this mean that we can get away without building dispatch tables for abstract classes? Yes, but life is simpler if we build them anyway. Here’s why:

1
2
3


class A { virtual void X() { } }
abstract class B : A { override void X() { } abstract void Y(); }
class C : B { override void Y() { } }

In the above example, B is an abstract class, so there will be no instances of exactly type B. However, instances of type C should inherit the override B::X(). If we skip the construction of the dispatch table for class B, then the construction of the dispatch table for class C requires a more complicated algorithm than what we outlined above.

On the other hand, if we follow our algorithm for abstract classes as well, we don’t have to make any particular special provisions. It’s all easier (and less error prone) if we treat any abstract methods the same way we treat new non-abstract virtual methods.

With abstract methods, there is only the added question of filling out their entry in the dispatch table — since these methods don’t have an implementation, there is no obvious value for their function pointers. As such, we can leave the value unspecified (not a particularly good idea), we can set a default value (such as a null pointer), or we can place a pointer to a “panic” method that has the same signature as the method:

1

void Y__<>Abstract() { throw new NotSupportedException(); }

Dispatch table for B:

offset	value
0x0	`&B::X()`
0x4	`&B::Y__<>Abstract()`

In the next post, we will see how we can generalize this process to dispatch over more than one virtual arguments — in other words, how to implement multiple dispatch.

Contents

Memory layout overview

Ordering types

Building the tables

Abstract classes and methods

Next