Implementing multiple dispatch polymorphism
Contents
As we saw previously, many popular OOP languages support subtype polymorphism with single dispatch. This means that the dynamic resolution of a polymorphic call happens with respect to one argument only: the this
reference.
The problem with this limitation is that it’s completely arbitrary. In a well-written program, there is no natural tendency for polymorphism to be useful only in one dimension. Quite the contrary – if a program is written with best OOP practices in mind (e.g. one responsibility per class), chances are that the need for multidimensional polymorphism, also called multiple dispatch, will eventually arise.
Any problem that involves specialized handling based on more than one of its parameters is a candidate for multiple dispatch. Here are some examples:
- In a compiler, we could have a
GenerateCode(Expression expr, CodeGenerator gen)
function that generates code for various kinds of expressions (assignment, method call, constructor call, field access, etc) using a given code generator (for x86, ARM, LLVM, CLR, JVM, etc). Different expressions generate different code for the same platform. But also, different platforms require different code for the same expression. The actual implementation ofGenerateCode
depends on both arguments. - In a physics simulator, we could have an
Intersects(Shape a, Shape b)
function that returns whether two physics bodies are colliding. Different shapes require a different calculation to (efficiently) check whether they are colliding with another given shape. Again, the actual implementation depends on both shapes.
When an operation depends on $N$ arguments, the set of all possible implementations is the $N$-fold Cartesian product of all the possible types for each argument — e.g. (in 2 dimensions) we need an implementation for all expression types × all platforms (in the case of the compiler), all shapes × all shapes (in the case of the physics simulator), and so on.
Working around single dispatch
Under single dispatch, the language will dispatch automatically at runtime based on one argument (at most). The programmer will have to take it from there and write code to manually dispatch again, this time based on the other argument.
If we were to imagine the situation as a 2D table, by calling a regular virtual method (like CodeGenerator::GenerateCode(Expression)
or Expression::GenerateCode(CodeGenerator)
), we are utilizing the language’s dynamic dispatch mechanism to find our place in one axis. Then, we somehow re-enter the table in the other dimension, in order to find our place in the other axis, thus pinpointing our specific coordinates:
GenerateCode | LlvmCodeGen |
ClrCodeGen |
… |
---|---|---|---|
Assignment |
&LlvmCodeGen::GenerateCode(Assignment) |
&ClrCodeGen::GenerateCode(Assignment) |
… |
FieldAccess |
&LlvmCodeGen::GenerateCode(FieldAccess) |
&ClrCodeGen::GenerateCode(FieldAccess) |
… |
… | … | … | … |
There are two main ways to do this.
Type checks
One obvious way is to perform explicit type checks (possibly in the form of pattern matching) and then dispatch based on the result:
|
|
Visitor pattern
Another approach to the plumbing code is the visitor pattern. The setup of this pattern for the above example may look like this:
|
|
The usage involves calling Accept()
on an expression, while passing some kind of visitor — for example, a code generator. The Accept()
method is virtual, meaning that the implementation chosen will be selected based on the actual type of the expression instance. Wherever the program flow lands, the type of the expression (this
) is now known, meaning that we can statically select the proper overload of Visit()
:
|
|
This is a virtual call again (more precisely, an interface call), which will dispatch to the Visit(Assignment)
method of whatever code generator (or other kind of visitor) has been passed.
|
|
Multiple dispatch
In a language with multiple dispatch, the explicit type checks and the visitor pattern boilerplate methods would be unnecessary.
Ideally, we would make a simple call without giving it much further thought:
|
|
Then, the program would go to the correct implementation, out of the ones we have defined:
|
|
To implement single dispatch, we simply included a function pointer in the dispatch table. The call spot had to query that dispatch table in order to find the appropriate function.
For multiple dispatch, we need a more generalized scheme. The call spot must query an arbitrary number of types and deduce a single function pointer out of them.
Generalized dispatch tables
When dispatching on $N$ virtual arguments, the possible implementations exist in a $N$-dimensional space, in which each dimension is related to a particular virtual parameter.
Consider the following case:
|
|
In this group, we have two virtual parameters. Therefore, we need some kind of two-dimensional array:
? | ? | ? | ? | ? | ? | |
---|---|---|---|---|---|---|
? | ||||||
? | ||||||
? | ||||||
? | ||||||
? |
As we said, each dimension in the $N$-dimensional array is related to a particular virtual parameter. In each dimension, we need as many slots as the types that can be passed as arguments to the corresponding virtual parameter.
If we assume that the base parameter type, object, is the top type, then instances of any type can be passed as an argument. Thus, for the above function group, we would need an array of all types × all types:
b↓ a→ | object | List | string | Stream | Window | … |
---|---|---|---|---|---|---|
object | ||||||
List | ||||||
string | ||||||
Stream | ||||||
Window | ||||||
… |
Finally, we need to fill the $N$-dimensional array with values. Those values will be pointers to the actual functions that we need to call for each combination of types.
With single dispatch, it’s fairly easy to pick the correct implementation: Start from the type we’re building a dispatch table for. Ask if it provides an override. If it does, choose that. If not, go to its parent and ask the same question.
With multiple dispatch, the choice is more complicated than that. Fortunately, if our language supports function overloading, then we already have in the compiler an algorithm that takes a set of types and a list of functions and chooses the most appropriate function for that set of arguments. In other words, we can implement dynamic dispatch as a precomputed application of overload resolution.
Let’s review our case again and let’s give the functions some names based on the order they are declared: M[0]
, M[1]
, M[2]
and M[3]:
|
|
With known overload resolution rules, we can now fill the array like this:
b↓ a→ | object | List | string | Stream | Window | … |
---|---|---|---|---|---|---|
object | &M[0] |
&M[0] |
&M[1] |
&M[0] |
&M[0] |
|
List | &M[0] |
&M[0] |
&M[1] |
&M[0] |
&M[0] |
|
string | &M[2] |
&M[2] |
&M[3] |
&M[2] |
&M[2] |
|
Stream | &M[0] |
&M[0] |
&M[1] |
&M[0] |
&M[0] |
|
Window | &M[0] |
&M[0] |
&M[1] |
&M[0] |
&M[0] |
|
… |
Now we need to teach these types how to find their position in each dimension of this array. This is fairly straightforward — we can simply declare an unused offset in their metadata for each parameter of the function group.
Let’s pick 0x84
for parameter a
and 0x88
for parameter b
. We can then go to each type’s metadata and place these values:
object
metadata:
address | value |
---|---|
0x84 |
0 |
0x88 |
0 |
List
metadata:
address | value |
---|---|
0x84 |
1 |
0x88 |
1 |
string
metadata:
address | value |
---|---|
0x84 |
2 |
0x88 |
2 |
Stream
metadata:
address | value |
---|---|
0x84 |
3 |
0x88 |
3 |
Window
metadata:
address | value |
---|---|
0x84 |
4 |
0x88 |
4 |
Making a virtual call
The calling procedure for a call to the above group is as follows:
|
|
First, we visit the metadata of arg1
, which should correspond to its runtime type. Because arg1
is used as the argument to parameter a
, we look at the offset 0x84
in the metadata table and we get a value — let’s say 2
.
Then, we do the same for arg2
. But this time, because it’s used as the argument to parameter b
, we look at the offset 0x88
and we get another value — let’s say 4
.
Then, we go to the precomputed 2-dimensional array that we built for the function group M and we take the value found in the coordinates (2, 4)
. That value is a function pointer — let’s say it points to the function we dubbed M[1]
.
Lastly, we can follow the normal procedure and make the call to whatever code can be found through that pointer.
Compressing the dispatch table
The methodology we described above is very simple to implement, but it has the potential to create enormous dispatch tables.
Let’s take another look at the above table again:
b↓ a→ | object | List | string | Stream | Window | … |
---|---|---|---|---|---|---|
object | &M[0] |
&M[0] |
&M[1] |
&M[0] |
&M[0] |
|
List | &M[0] |
&M[0] |
&M[1] |
&M[0] |
&M[0] |
|
string | &M[2] |
&M[2] |
&M[3] |
&M[2] |
&M[2] |
|
Stream | &M[0] |
&M[0] |
&M[1] |
&M[0] |
&M[0] |
|
Window | &M[0] |
&M[0] |
&M[1] |
&M[0] |
&M[0] |
|
… |
It’s easy to notice that most columns and most rows are redundant. And there’s a simple pattern to it — the unique columns and rows correspond to the distinct types that appear in each parameter: object
and string
. Each of these types is called a pole and it casts its influence on other types.
We may start to suspect that non-pole types don’t need to point to their own column or row. They can instead point to the column or row of their pole — this is called row sharing. Then we don’t need the unused columns and rows, which allows us to keep a much more compact table:
b↓ a→ | object | string |
---|---|---|
object | &M[0] |
&M[1] |
string | &M[2] |
&M[3] |
object
metadata:
offset | value |
---|---|
0x84 | 0 |
0x88 | 0 |
List
metadata:
offset | value |
---|---|
0x84 | 0 |
0x88 | 0 |
string
metadata:
offset | value |
---|---|
0x84 | 1 |
0x88 | 1 |
Stream
metadata:
offset | value |
---|---|
0x84 | 0 |
0x88 | 0 |
Window
metadata:
offset | value |
---|---|
0x84 | 0 |
0x88 | 0 |
Let’s look at another, less symmetric example:
|
|
b↓ a→ | object | string |
---|---|---|
object | &X[0] |
&X[3] |
List | &X[1] |
&X[3] |
Window | &X[0] |
&X[2] |
Now we set an offset for parameter a
of the function group X
. Let’s say we pick 0x8c
across all metadata tables. In the object
metadata table, we place the value 0
because the a
dimension starts with object
as its first element:
Again, we place the value 0
to the metadata tables of all subtypes of object
, because we want all other subtypes to fall back to object
(unless they have their own pole):
Now we go to the string
dispatch table and place the value of 1
, discarding the value of 0
that we just put during object
’s turn:
We do the same for all subtypes of string
(if any), because anything more specific than string
should fall back to string
, not object
.
Now we can assign an offset to parameter b
of function group X
. Let’s say we now pick the next 4-byte offset, 0x90
, across all dispatch tables. Again, object
and all of its subtypes get a 0
:
In our selected ordering, we make the List
(and all of its subtypes) metadata point to the List
row by giving the value 1, then we make the Window
(and all of its subtypes) metadata point to the Window
row by giving the value 2:
Dispatch trees
Like with so many multidimensional concepts, we can apply currying here to create a different approach. If you’re unfamiliar with currying, here’s a simple introduction:
|
|
This is a function that takes two integers and returns their sum, which is another integer.
Using lambda expressions, we can rewrite it like this:
|
|
Now it’s a function that takes a single integer and gives you back not an integer, but another function. That returned function also takes a single integer and, when called, will return the sum of the first integer and the second integer. In other words, the second function has “captured” the first integer that led to its existence.
We can apply an analogous methodology to dynamic dispatch. We can dispatch dynamically based on the first argument and get a dispatch table that “captures” this result, just like the returned function above “captured” the first argument. Subsequently, we can use that dispatch table to perform dispatch based on the second argument, which will give us the actual function pointer.
Let’s remember our previous compressed dispatch table:
b↓ a→ | object | string |
---|---|---|
object | &X[0] |
&X[3] |
List | &X[1] |
&X[3] |
Window | &X[0] |
&X[2] |
First, we can go to the metadata of the types, pick an unused offset (let’s say 0xb0
) and allocate pointers that will allow us to jump to the intermediate tables. We can also allocate more space (let’s say 0xb4
) that we will use to jump from the intermediate table to the actual function pointer.
The second parameter has three poles, so we need to allocate three fields in the intermediate tables:
X → object |
---|
0x0 |
0x4 |
0x8 |
X → string |
---|
0x0 |
0x4 |
0x8 |
Let’s make the types point to the appropriate intermediate tables:
And now we need a way for types to find their way inside the intermediate tables, so let’s give them their internal offsets:
Now we can fill the two intermediate tables, effectively wiring them to the functions:
X → object |
|
---|---|
0x0 | &X[0] |
0x4 | &X[1] |
0x8 | &X[0] |
X → string |
|
---|---|
0x0 | &X[3] |
0x4 | &X[3] |
0x8 | &X[2] |
We have effectively created a dispatch tree:
For this methodology, the calling preparation is as follows:
|
|
First, we go to the metadata of arg1
, offset 0xb0
and we read a data pointer (to an intermediate table). Then, we go to the metadata of arg2
, offset 0xb4
and we read an offset. We add the data pointer and the offset and get a new data pointer. We dereference that data pointer and we get a function pointer. Then, finally, we make a regular call.
Let’s come up with another example to better illustrate this approach:
|
|
Because the compressed dispatch table would now be 3-dimensional, we won’t show it here, but we will assume that it’s already been constructed.
Let’s decide on an offset, say 0xc0
— this is where we will find address to the first intermediate table. We will also allocate two more offsets, say 0xc4
and 0xc8
, that we will use to navigate within the first and second intermediate tables respectively.
For the second parameter, there are three poles: object
(offset 0x0
), List
(offset 0x4
) and Window
(offset 0x8
).
Z → object |
|
---|---|
0x0 | &(Z → object → object) |
0x4 | &(Z → object → List) |
0x8 | &(Z → object → Window) |
Z → string |
|
---|---|
0x0 | &(Z → string → object) |
0x4 | &(Z → string → List) |
0x8 | &(Z → string → Window) |
Z → List |
|
---|---|
0x0 | &(Z → List → object) |
0x4 | &(Z → List → List) |
0x8 | &(Z → List → Window) |
Z → Window |
|
---|---|
0x0 | &(Z → Window → object) |
0x4 | &(Z → Window → List) |
0x8 | &(Z → Window → Window) |
For the third parameter, we have three poles: object
, string
, and List
. So we allocate three fields and fill them with the function pointers that are computed by our hypothetical overload resolution algorithm, as shown below. (The actual C# overload resolution algorithm would take issue with some of the choices I made, but we’ll talk about that later.)
Z → object → object |
|
---|---|
0x0 | &(Z → object → object → object) = &Z[0] |
0x4 | &(Z → object → object → string) = &Z[0] |
0x8 | &(Z → object → object → List) = &Z[0] |
Z → object → List |
|
---|---|
0x0 | &(Z → object → List → object) = &Z[0] |
0x4 | &(Z → object → List → string) = &Z[0] |
0x8 | &(Z → object → List → List) = &Z[0] |
Z → object → Window |
|
---|---|
0x0 | &(Z → object → Window → object) = &Z[1] |
0x4 | &(Z → object → Window → string) = &Z[1] |
0x8 | &(Z → object → Window → List) = &Z[1] |
Z → string → object |
|
---|---|
0x0 | &(Z → string → object → object) = &Z[0] |
0x4 | &(Z → string → object → string) = &Z[3] |
0x8 | &(Z → string → object → List) = &Z[0] |
Z → string → List |
|
---|---|
0x0 | &(Z → string → List → object) = &Z[0] |
0x4 | &(Z → string → List → string) = &Z[3] |
0x8 | &(Z → string → List → List) = &Z[4] |
Z → string → Window |
|
---|---|
0x0 | &(Z → string → Window → object) = &Z[1] |
0x4 | &(Z → string → Window → string) = &Z[2] |
0x8 | &(Z → string → Window → List) = &Z[1] |
Z → List → object |
|
---|---|
0x0 | &(Z → List → object → object) = &Z[5] |
0x4 | &(Z → List → object → string) = &Z[5] |
0x8 | &(Z → List → object → List) = &Z[5] |
Z → List → List |
|
---|---|
0x0 | &(Z → List → List → object) = &Z[5] |
0x4 | &(Z → List → List → string) = &Z[5] |
0x8 | &(Z → List → List → List) = &Z[5] |
Z → List → Window |
|
---|---|
0x0 | &(Z → List → Window → object) = &Z[5] |
0x4 | &(Z → List → Window → string) = &Z[5] |
0x8 | &(Z → List → Window → List) = &Z[5] |
Z → Window → object |
|
---|---|
0x0 | &(Z → Window → object → object) = &Z[0] |
0x4 | &(Z → Window → object → string) = &Z[0] |
0x8 | &(Z → Window → object → List) = &Z[0] |
Z → Window → List |
|
---|---|
0x0 | &(Z → Window → List → object) = &Z[0] |
0x4 | &(Z → Window → List → string) = &Z[6] |
0x8 | &(Z → Window → List → List) = &Z[0] |
Z → Window → Window |
|
---|---|
0x0 | &(Z → Window → Window → object) = &Z[1] |
0x4 | &(Z → Window → Window → string) = &Z[1] |
0x8 | &(Z → Window → Window → List) = &Z[1] |
And we’re finally done.
Compressing the dispatch tree
Let’s consider the space we’re using here. For simplicity, let’s assume that all addressing variables (function pointers, data pointers, indices, offsets) have the same storage requirements (e.g. 4 or 8 bytes). Sure, this assumption does not hold for all platforms, but that’s okay — we won’t do anything important with it, just some estimations.
In that last example:
- The table approach required 36 function pointers in the dispatch table, plus 3 indices per type.
- The tree approach required 36 function pointers in the second-layer tables, plus 12 data pointers in the first-layer tables, plus 3 pointers/offsets per type.
This might make the tree approach sound more space-consuming, but the tree approach can be compressed further. Notice that quite a few of the tables in the final stage make the same selections. Specifically:
- The triad (
&Z[0]
,&Z[0]
,&Z[0]
) appears in 3 out of the 12 tables of the final stage. - The triad (
&Z[1]
,&Z[1]
,&Z[1]
) also appears in 3 out of the 12 tables. - The triad (
&Z[5]
,&Z[5]
,&Z[5]
) also appears in 3 out of the 12 tables.
Clearly, then, we don’t need all of those tables — we need just one of each and we can reuse it by rewriting the pointers in the previous layer. This will reduce the tables at the final stage by 6 and their total contained function pointers by 18, arriving at 30 pointers flat, plus 3 per type.
We are left with these unique tables in the final stage:
Z → object → object |
|
---|---|
0x0 | &(Z → object → object → object) = &Z[0] |
0x4 | &(Z → object → object → string) = &Z[0] |
0x8 | &(Z → object → object → List) = &Z[0] |
Z → object → Window |
|
---|---|
0x0 | &(Z → object → Window → object) = &Z[1] |
0x4 | &(Z → object → Window → string) = &Z[1] |
0x8 | &(Z → object → Window → List) = &Z[1] |
Z → string → object |
|
---|---|
0x0 | &(Z → string → object → object) = &Z[0] |
0x4 | &(Z → string → object → string) = &Z[3] |
0x8 | &(Z → string → object → List) = &Z[0] |
Z → string → List |
|
---|---|
0x0 | &(Z → string → List → object) = &Z[0] |
0x4 | &(Z → string → List → string) = &Z[3] |
0x8 | &(Z → string → List → List) = &Z[4] |
Z → string → Window |
|
---|---|
0x0 | &(Z → string → Window → object) = &Z[1] |
0x4 | &(Z → string → Window → string) = &Z[2] |
0x8 | &(Z → string → Window → List) = &Z[1] |
Z → List → object |
|
---|---|
0x0 | &(Z → List → object → object) = &Z[5] |
0x4 | &(Z → List → object → string) = &Z[5] |
0x8 | &(Z → List → object → List) = &Z[5] |
Z → Window → List |
|
---|---|
0x0 | &(Z → Window → List → object) = &Z[0] |
0x4 | &(Z → Window → List → string) = &Z[6] |
0x8 | &(Z → Window → List → List) = &Z[0] |
And these rewritten pointers in the previous stage:
Z → object |
|
---|---|
0x0 | &(Z → object → object) |
0x4 | &(Z → object → object) |
0x8 | &(Z → object → Window) |
Z → string |
|
---|---|
0x0 | &(Z → string → object) |
0x4 | &(Z → string → List) |
0x8 | &(Z → string → Window) |
Z → List |
|
---|---|
0x0 | &(Z → List → object) |
0x4 | &(Z → List → object) |
0x8 | &(Z → List → object) |
Z → Window |
|
---|---|
0x0 | &(Z → object → object) |
0x4 | &(Z → Window → List) |
0x8 | &(Z → object → Window) |
It’s worth pointing out that alternative orderings of the parameters may yield a larger or a smaller total number of pointers after this compression.
As a side note, now that we took the dispatch tree and we started reusing its nodes, it ceased being a tree — it’s now a directed acyclic graph.
Handling ambiguity
In all the previous examples, we assumed that we have a hypothetical overload resolution that can always resolve ties. In other words, if there is at least one applicable overload for the set of types we pass, then one of the candidate functions will definitely be selected.
There are languages in which overload resolution works that way, and there are other languages where it doesn’t. Some language designers order overloads by appropriateness based on some criteria that they don’t consider particularly debatable. If those criteria don’t manage to produce a single result, and further tie-breaking attempts are deemed too subjective, overload resolution fails and a diagnostic message is issued.
Consider the following case:
|
|
An overload resolution mechanism could reasonably see this call as ambiguous:
|
|
Since we’re using the same mechanism to precompute the results of dynamic dispatch, that same error propagates itself to the dispatch-building algorithm.
b↓ a→ | object | string |
---|---|---|
object | &V[0] |
&V[1] |
string | &V[2] |
&V[1] or &V[2] |
If such a case occurs during the construction of the dispatch table, one reasonable response is to issue a compilation error and ask the programmer to provide an override that covers the ambiguous case.
In a future post, we might explore various overload resolution algorithms.
Bibliography
- Amiel, E., Gruber, O. and Simon, E. (1994). Optimizing multi-method dispatch using compressed dispatch tables. ACM SIGPLAN Notices, 29(10), pp.244-258.
- Dujardin, É. (1996). Efficient Dispatch of Multimethods in Constant Time Using Dispatch Trees.
- Gamma, E., Helm, R., Johnson, R. and Vlissides, J. (1994). Design patterns: Design Patterns: Elements of Reusable Object-Oriented Software. 1st ed. Boston, Mass.: Addison-Wesley.
- Pirkelbauer, P., Solodkyy, Y. and Stroustrup, B. (2010). Design and evaluation of C++ open multi-methods. Science of Computer Programming, [online] 75(7), pp.638-667.
Author Theodoros Chatzigiannakis
LastMod 2017-03-15