As we saw previously, many popular OOP languages support subtype polymorphism with single dispatch. This means that the dynamic resolution of a polymorphic call happens with respect to one argument only: the this reference.

The problem with this limitation is that it’s completely arbitrary. In a well-written program, there is no natural tendency for polymorphism to be useful only in one dimension. Quite the contrary – if a program is written with best OOP practices in mind (e.g. one responsibility per class), chances are that the need for multidimensional polymorphism, also called multiple dispatch, will eventually arise.

Any problem that involves specialized handling based on more than one of its parameters is a candidate for multiple dispatch. Here are some examples:

  • In a compiler, a GenerateCode(Expression expr, CodeGenerator gen) function that generates code for various kinds of expressions (assignment, method call, constructor call, field access, etc) using a given code generator (for x86, ARM, LLVM, CLR, JVM, etc). Different expressions generate different code for the same platform. Different platforms require different code for the same expression. Therefore, the actual implementation depends on both parameters.
  • In a physics simulator, an Intrersects(Shape a, Shape b) function that returns whether two physics bodies are colliding. Different shapes require a different calculation to (efficiently) check whether they are colliding with another given shape. Again, the actual implementation depends on both shapes.

When an operation depends on $N$ arguments, the set of all possible implementations is the $N$-fold Cartesian product of all the possible types for each argument — e.g. (in 2 dimensions) we need an implementation for all expression types × all platforms (in the case of the compiler), all shapes × all shapes (in the case of the physics simulator), and so on.

Working around single dispatch

Under single dispatch, the language will dispatch automatically at runtime based on one parameter. The programmer will have to take it from there and write code to manually dispatch again, this time based on the other parameter. There are two main ways to do this.

Type checks

One obvious way is to perform explicit type checks (possibly in the form of pattern matching) and then dispatch based on the result:

class LlvmCodeGenerator : CodeGenerator
    public override void GenerateCode(Expression expr)
        if (expr is Assignment a) GenerateCode(a);
        else if (expr is FieldAccess f) GenerateCode(f);
        /* and so on */

    public void GenerateCode(Assignment expr) { }
    public void GenerateCode(FieldAcces expr) { }

Visitor pattern

Another approach to the plumbing code is the visitor pattern. The setup of this pattern for the above example may look like this:

interface IExpressionVisitor
    void Visit(Assignment expr);
    void Visit(FunctionCall expr);
    void Visit(ConstructorCall expr);

abstract class Expression
    public abstract void Accept(IExpressionVisitor visitor);

The usage involves calling Accept() on an expression, while passing some kind of visitor — for example, a code generator. The Accept() method is virtual, meaning that the implementation chosen will be selected based on the actual type of the expression instance. Wherever the program flow lands, the type of the expression (this) is now known, meaning that we can statically select the proper overload of Visit():

class Assignment : Expression
    public void Accept(IExpressionVisitor visitor)

This is a virtual call again, which will dispatch to the Visit(Assignment) method of whatever code generator (or other kind of visitor) has been passed.

class LlvmCodeGenerator : IExpressionVisitor
    void Visit(Assignment expr) { }
    void Visit(FunctionCall expr) { }
    void Visit(ConstructorCall expr) { }

## Multiple dispatch
In a language with multiple dispatch, the explicit type checks and the visitor pattern boilerplate methods would be unnecessary.

Ideally, we would make a simple call without giving it much further thought:

/* ... */
GenerateCode(generator, expression);
/* ... */

Then, the program would go to the correct implementation, out of the ones we have defined:

/* base */
GenerateCode(virtual CodeGenerator gen, virtual Expression expr) { }

/* specializations */
GenerateCode(override LlvmCodeGenerator gen, override Assignment expr) { } 
GenerateCode(override LlvmCodeGenerator gen, override FunctionCall expr) { } 
GenerateCode(override LlvmCodeGenerator gen, override ConstructorCall expr) { } 
GenerateCode(override ClrCodeGenerator gen, override Assignment expr) { } 
GenerateCode(override ClrCodeGenerator gen, override FunctionCall expr) { } 
GenerateCode(override ClrCodeGenerator gen, override ConstructorCall expr) { } 
GenerateCode(override JvmCodeGenerator gen, override Assignment expr) { } 
GenerateCode(override JvmCodeGenerator gen, override FunctionCall expr) { } 
GenerateCode(override JvmCodeGenerator gen, override ConstructorCall expr) { } 
/* ... */

To implement single dispatch, we simply included a function pointer in the dispatch table. The call spot had to query that dispatch table in order to find the appropriate function.

For multiple dispatch, we need a more generalized scheme. The call spot must query an arbitrary number of types and deduce a single function pointer out of them.

Generalized dispatch tables

When dispatching on $N$ virtual arguments, the possible implementations exist in a $N$-dimensional space, in which each dimension is related to a particular virtual parameter.

Consider the following case:

void M(virtual object a, virtual object b);
void M(override string a, override object b);
void M(override object a, override string b);
void M(override string a, override string b);

In this group, we have two virtual parameters. Therefore, we need some kind of two-dimensional array:

As we said, each dimension in the $N$-dimensional array is related to a particular virtual parameter. In each dimension, we need as many slots as the types that can be passed as arguments to the corresponding virtual parameter.

If we assume that the base parameter type, object, is the top type, then instances of any type can be passed as an argument. Thus, for the above function group, we would need an array of all types × all types:

Finally, we need to fill the $N$-dimensional array with values. Those values will be pointers to the actual functions that we need to call for each combination of types.

With single dispatch, it’s fairly easy to pick the correct implementation: Start from the type we’re building a dispatch table for. Ask if it provides an override. If it does, choose that. If not, go to its parent and ask the same question.

With multiple dispatch, the choice is more complicated than that. Fortunately, if our language supports function overloading, then we already have in the compiler an algorithm that takes a set of types and a list of functions and chooses the most appropriate function for that set of arguments. In other words, we can implement dynamic dispatch as a precomputed application of overload resolution.

Let’s review our case again and let’s give the functions some names based on the order they are declared: M[0], M[1], M[2] and M[3]:

void M(virtual object a, virtual object b);    // M[0]
void M(override string a, override object b);  // M[1]
void M(override object a, override string b);  // M[2]
void M(override string a, override string b);  // M[3]

With known overload resolution rules, we can now obviously fill the array like this:

Now we need to teach these types how to find their position in each dimension of this array. This is fairly easy — we can simply declare an unused offset in their metadata for each parameter of the function group.

Let’s pick 0x84 for parameter a and 0x88 for parameter b. We can then go to each type’s metadata and place these values:

Making a virtual call

The calling procedure for a call to the above group is as follows:

First, we visit the metadata of `arg1`, which should correspond to its runtime type. Because `arg1` is used as the argument to parameter `a`, we look at the offset `0x84` in the metadata table and we get a value — let's say `2`.

Then, we do the same for `arg2`. But this time, because it's used as the argument to parameter `b`, we look at the offset `0x88` and we get another value — let's say `4`.

Then, we go to the precomputed 2-dimensional array that we built for the function group M and we take the value found in the coordinates `(2, 4)`. That value is a function pointer — let's say it points to the function we dubbed `M[1]`.

Lastly, we can follow the normal procedure and make the call to whatever code can be found through that pointer.

### Compressing the dispatch table
The methodology we described above is very simple to implement, but it has the potential to create enormous dispatch tables.

Let's take another look at the above table again:

<img src="/assets/multiple-dispatch/M-Table-Uncompressed.png" />

It's easy to notice that most columns and most rows are redundant. And it's fairly easy to see a pattern — the unique columns and rows correspond to the distinct types that appear in each parameter: `object` and `string`. Each of these types is called a *pole* and it casts its influence on other types.

We can easily surmise that non-pole types don't need to point to their own column or row. They can instead point to the column or row of their pole — this is called *row sharing.*

Obviously, then, we don't need the unused columns and rows. This allows us to keep a much more compact table:

<img src="/assets/multiple-dispatch/M-Dispatch-Compressed.png" />

<img src="/assets/multiple-dispatch/M-Metadata-Compressed.png" />

Let's look at another, less symmetric example:

void X(virtual object a, virtual object b);       // X[0]
void X(override object a, override List b);       // X[1]
void X(override string a, override Window b);     // X[2]
void X(override string a, override object b);     // X[3]

Now we set an offset for parameter a of the function group X. Let’s say we pick 0x8c across all metadata tables. In the object metadata table, we place the value 0 because the a dimension starts with object as its first element:

Again, we place the value 0 to the metadata tables of all subtypes of object, because we want all other subtypes to fall back to object (unless they have their own pole):

Now we go to the string dispatch table and place the value of 1, discarding the value of 0 that we just put during object’s turn:

We do the same for all subtypes of string (if any), because anything more specific than string should fall back to string, not object.

Now we can assign an offset to parameter b of function group X. Let’s say we now pick the next 4-byte offset, 0x90, across all dispatch tables. Again, object and all of its subtypes get a 0:

In our selected ordering, we make the List (and all of its subtypes) metadata point to the List row by giving the value 1, then we make the Window (and all of its subtypes) metadata point to the Window row by giving the value 2:

Dispatch trees

Like with so many multidimensional concepts, we can apply currying here to create a different approach. If you’re unfamiliar with currying, here’s a simple introduction:

int Add(int a, int b) => a + b;

This is a function that takes two integers and returns their sum, which is another integer.

Using lambda expressions, we can rewrite it like this:

Func<int, int> Add (int a) => (int b) => a + b;

Now it’s a function that takes a single integer and gives you back not an integer, but another function. That returned function also takes a single integer and, when called, will return the sum of the first integer and the second integer. In other words, the second function has “captured” the first integer that led to its existence.

We can apply an analogous methodology to dynamic dispatch. We can dispatch dynamically based on the first argument and get a dispatch table that “captures” this result, just like the returned function above “captured” the first argument. Subsequently, we can use that dispatch table to perform dispatch based on the second argument, which will give us the actual function pointer.

Let’s remember our previous compressed dispatch table:

First, we can go to the metadata of the types, pick an unused offset (let’s say 0xb0) and allocate pointers that will allow us to jump to the intermediate tables. We can also allocate more space (let’s say 0xb4) that we will use to jump from the intermediate table to the actual function pointer.

The second parameter has three poles, so we need to allocate three fields in the intermediate tables:

Let’s make the types point to the appropriate intermediate tables:

And now we need a way for types to find their way inside the intermediate tables, so let’s give them their internal offsets:

Now we can fill the two intermediate tables, effectively wiring them to the functions:

We have effectively created a dispatch tree:

For this methodology, the calling preparation is as follows:

X(arg1, arg2);

First, we go to the metadata of arg1, offset 0xb0 and we read a data pointer (to an intermediate table). Then, we go to the metadata of arg2, offset 0xb4 and we read an offset. We add the data pointer and the offset and get a new data pointer. We dereference that data pointer and we get a function pointer. Then, obviously, we make a regular call.

Let’s come up with another example to better illustrate this approach:

void Z(virtual object a, virtual object b, virtual object c);    // Z[0]
void Z(override object a, override Window b, override object c); // Z[1]
void Z(override string a, override Window b, override string c); // Z[2]
void Z(override string a, override object b, override string c); // Z[3]
void Z(override string a, override List b, override List c);     // Z[4]
void Z(override List a, override object b, override object c);   // Z[5] this
void Z(override Window a, override List b, override string c);   // Z[6]

Because the compressed dispatch table would now be 3-dimensional, we won’t show it here, but we will assume that it’s already been constructed.

Let’s decide on an offset, say 0xc0 — this is where we will find address to the first intermediate table. We will also allocate two more offsets, say 0xc4 and 0xc8, that we will use to navigate within the first and second intermediate tables respectively.

For the second parameter, there are three poles: object (offset 0x0), List (offset 0x4) and Window (offset 0x8).

For the third parameter, we have three poles: object, string, and List. So we allocate three fields and fill them with the function pointers that are computed by our hypothetical overload resolution algorithm, as shown below. (The actual C# overload resolution algorithm would take issue with some of the choices I made, but we’ll talk about that later.)

And we’re finally done.

Compressing the dispatch tree

Let’s consider the space we’re using here. For simplicity, let’s assume that all addressing variables (function pointers, data pointers, indices, offsets) have the same storage requirements (e.g. 4 or 8 bytes). Obviously, this assumption does not hold for all platforms, but that’s okay — we won’t do anything important with it, just some estimations.

In that last example:

  • The table approach required 36 function pointers in the dispatch table, plus 3 indices per type.
  • The tree approach required 36 function pointers in the second-layer tables, plus 12 data pointers in the first-layer tables, plus 3 pointers/offsets per type.

This might make the tree approach sound more space-consuming, but the tree approach can be compressed further. Notice that quite a few of the tables in the final stage make the same selections. Specifically:

  • The triad (&Z[0], &Z[0], &Z[0]) appears in 3 out of the 12 tables of the final stage.
  • The triad (&Z[1], &Z[1], &Z[1]) also appears in 3 out of the 12 tables.
  • The triad (&Z[5], &Z[5], &Z[5]) also appears in 3 out of the 12 tables.

Clearly, then, we don’t need all of those tables — we need just one of each and we can reuse it by rewriting the pointers in the previous layer. This will reduce the tables at the final stage by 6 and their total contained function pointers by 18, arriving at 30 pointers flat, plus 3 per type.

We are left with these unique tables in the final stage:

And these rewritten pointers in the previous stage:

It’s worth pointing out that alternative orderings of the parameters may yield a larger or a smaller total number of pointers after this compression.

Obviously, now that we took the dispatch tree and we started reusing its nodes, it ceased being a tree — it’s now a directed acyclic graph.

Handling ambiguity

In all the previous examples, we assumed that we have a hypothetical overload resolution that can always resolve ties. In other words, if there is at least one applicable overload for the set of types we pass, then one of the candidate functions will definitely be selected.

There are languages in which overload resolution works that way, and there are other languages where it doesn’t. Some language designers order overloads by appropriateness based on some criteria that they don’t consider particularly debatable. If those criteria don’t manage to produce a single result, and further tie-breaking attempts are deemed too subjective, overload resolution fails and a diagnostic message is issued.

Consider the following case:

void V(object a, object b);    // V[0]
void V(string a, object b);    // V[1]
void V(object a, string b);    // V[2]

An overload resolution mechanism could reasonably see this call as ambiguous:

V("", "");    // V[1] or V[2] can be considered equally applicable

Since we’re using the same mechanism to precompute the results of dynamic dispatch, that same error propagates itself to the dispatch-building algorithm.

If such a case occurs during the construction of the dispatch table, one reasonable response is to issue a compilation error and ask the programmer to provide an override that covers the ambiguous case.

In a future post, we might explore various overload resolution algorithms.