In most statically typed languages, an object’s type is set in stone once the object has been completely constructed. While the object can subsequently be cast to various types, it can never change its runtime type — many aspects of type systems and features (such as virtual dispatch) depend on this.

In this post, we will see how we can use unsafe code in C# to bypass the type system and change a .NET object’s runtime type. Then, we will see an unusual case where it would be neat to have this. Needless to say, don’t try this in real code.

Memory layout

Before we go about changing an object’s type, we need to establish what exactly determines the type. In the .NET implementation of the CLR, class instances are laid out in such a way that we can see two distinct areas:

  • A pointer to the object’s type metadata. This allows the runtime to determine the actual type of the object (regardless of the expression that was used to access it), the size of the object, the correct virtual method implementations that should be called, and so on.
  • The object’s state — that is, all values of all fields of the object. Obviously, this spatial requirements for the state are at least equal to the sum of the spatial requirements of each of its fields.

Our first problem with the metadata pointer hidden in every class instance (or boxed struct instance) is that it’s inaccessible by any safe code. We will see how to bypass that.

Transmutation

Let’s start implementing a method that will take an object turn it into another object.

Getting the address of the object

The first thing we need to do is obtain the address of the object. In safe code, there is only one way to do this — we need to create a pinned GC handle to the object and then request the (now fixed) address:

var handle = GCHandle.Alloc(obj, GCHandleType.Pinned);
var addr = handle.AddrOfPinnedObject();
/* do our magic here */
handle.Free();

This code looks good but it won’t work in most cases. Consider this:

var obj = new List<int>();

Now GCHandle.Alloc() will throw a ArgumentException:

Object contains non-primitive or non-blittable data.

So, some objects can’t be pinned (most objects, in fact). And, strictly speaking, without pinning an object, we can’t safely take its address. How about we take its address unsafely?

To do that, we need to bring in the System.Runtime.CompilerServices.Unsafe class, which offers some deliciously unsafe methods that can be written in IL but not in C#. Specifically, we’re going to use a method that allows us to take an object reference and turn it into an address, regardless of pinning:

var addr = *(void**)Unsafe.AsPointer(ref obj);
  • We can pass any object reference as a ref argument, which will create a reference to a reference type variable.
  • We can pass that double-reference to the Unsafe.AsPointer() method, which will spit it out as a pointer.
  • Because the argument we passed was a reference to a reference type object, we can surmise that the returned thing is actually a pointer to a pointer, which can be typed as void** — and so we cast it to that.
  • But we soon realize that we don’t actually need the double indirection — we don’t need a pointer to the variable pointing to the object, we just need a pointer to the object itself. Thus, we just need to dereference once and we get a void* back.

So we have:

static unsafe void* GetObjectAddress(this object obj)
{
    return *(void**)Unsafe.AsPointer(ref obj);
}

Getting the metadata pointer

The metadata pointer of a .NET object lies at the beginning of its memory layout.

We already have a pointer to the object, typed void*. Now we realize that this same pointer is also a pointer to the metadata pointer, therefore it can additionally be typed as void**, which we can then dereference once more:

void* meta = *(void**)addr;

Setting the metadata pointer

Suppose we have two objects and we want to set one’s type to be the other’s type. Assuming we have already gotten their addresses, we simply need to set the metadata pointer of one to be equal to the metadata pointer of the other:

static unsafe void TransmuteTo(this object target, object source)
{
    var s = (void**)source.GetObjectAddress();
    var t = (void**)target.GetObjectAddress();
    *t = *s;
}

Because we opted not to pin the object while using its address, there are no guarantees that the two pointers s and t will remain valid throughout the TransmuteTo method.

  • If the pointers did remain valid throughout, then the target object’s type will be the same as the source object’s type.
  • If the pointers did not remain valid, then the target object’s type will not be the same as the source object’s type. But this would mean that we wrote a bunch of bytes to an illegal address (which could be anywhere) — therefore, if we detect that this happened, we need to shut down whatever we were doing.

But let’s also mention one corner case: what if the objects involved were already of the same type when we received them as arguments? Well, in that case, we don’t need to do any pointer trickery — we can just exit the method. Thus, the amended code becomes:

static unsafe void TransmuteTo(this object target, object source)
{
    if (target.GetType() == source.GetType()) return; // no need to act

    var s = (void**)source.GetObjectAddress();
    var t = (void**)target.GetObjectAddress();
    *t = *s;

    if (target.GetType() != source.GetType())
    {
        // something happened and we failed, so the entire program is in an invalid state now
        throw new AccessViolationException();
    }
}

Making the signature more pleasant

In the above example, the user code provides two objects and one of them gets its type changed. Let’s see how it looks from their point of view:

var source = new List<string>();
var target = new List<object>();
source.TransmuteTo(target);
var changed = (List<object>)source; // compile-time error

The compiler knows that there is no way this cast can succeed in valid, type-safe, memory-safe code. Therefore, the user needs to bypass the checks by casting twice:

var changed = (List<object>)(object)source;

This is a bit ugly, so let’s provide a convenience method:

public static T TransmuteTo<T>(this object target, T source)
{
    target.TransmuteTo((object)source);
    return (T)target;
}

Also, we wrote a method that requires a dummy object of the desired type to be provided. For many types, we can provide another convenience overload that doesn’t require that:

public static T TransmuteTo<T>(this object target)
    where T : new()
{
    return target.TransmuteTo(new T());
}

This is one of the few nice uses of new() because we really want the constructed object to be exactly T and this is just a convenience overload anyway.

Hazards

It is self-evident that changing an object’s type is extremely dangerous. The reasons are:

  • It invalidates assumptions made by any code that operates on that object, if that code has retained references of the previous type. Specifically, variables of the old type could still be referencing the transmuted instance — except now the actual type of those variables might be completely incompatible with the type of the instance obtained through them!
  • The object’s state may be nonsensical for objects of the resulting type. As fields get shuffled around, there is no guarantee about what data the resulting object would contain.
  • Changing the object’s type changes the object’s size, as far as the runtime is concerned. If the new size is larger than the original, then the object’s memory will start overlapping with the memory of other live objects — and, due to the compacting GC, it won’t even be the same objects throughout the execution, which completely trashes all kinds of safety in the running program.

A weird use case

Given all this unsafety in our method and the fact that most of these dangers are present even if the method succeeds, are there any possible use cases for changing an object’s type?

Well, there is an interesting one.

It only takes some basic understanding of type systems to understand that even though a string is a kind of object, a List<string> is not a kind of List<object> — in other words, the T in List<T> is not covariant. It’s easy to see why, if we consider that List<T> contains an Add(T) method. Covariance of T would allow us to call Add(object) on a List<string>, therefore allowing the possibility to have non-string instances in a List<string>.

If we ever want to add a non-string object to a List<string>, what we need to do is create a new list, one of type List<object>, then copy over all existing items from the old list, then add the new (non-string) object in a type safe way.

But instead of doing the reasonable thing, let’s go do the crazy thing instead. Instead of copying items over, let’s just use our knowledge that object and string variables have the same representation to attempt to transmute a List<string> into a List<object>:

public static class Weird
{
    public static List<TResult> Add<TSource, TItem, TResult>(this List<TSource> list, TItem item)
        where TSource : TResult
        where TItem : TResult
    {
        var e = list.TransmuteTo<List<TResult>>();
        e.Add(item);
        return e;
    }
}

This method’s signature says “given a list of TSource and a single TItem, both of which are assignable to TResult, I can give you back a list of TResult”. This is the sane part. The insane part is in the implementation, where we use TransmuteTo to change the type of the initial collection instead of creating a new one.

Now we can use the above crazy method like this:

List<string> oldList = new[] { "one", "two", "three" }.ToList();
List<object> newList = oldList.Add<string, object, object>(new object());

Array type mismatch

A careful reader may notice that there is a problem with our experiment — it seems to work, but it really shouldn’t. Can you see why?

We added a non-string object to what was a List<string>. We did it by transmuting the list to be a List<object>, therefore accepting any object.

But a List<T> is actually backed by an array of type T[]. This means that a List<string> is backed by a string[], a List<object> is backed by an object[], and so on. Yet we never changed the type of the backing array, which means that this experiment shouldn’t have worked.

Well, the answer is that the expected type of the backing field changed with the class, which means that the internal reference to a string[] is now interpreted as a reference to an object[], as an artifact of the type mutation we performed on the list.

But this opens up a new question. You see, C# actually supports array covariance:

string[] s = new string[4];
object[] o = s; // compiles fine

But even though the C# compiler allows it, it’s unsound from a theoretical point of view. To enforce type safety, the runtime environment has to validate every write to that array. If the assigned type is compatible with the array element type, the assignment succeeds silently — otherwise, an ArrayTypeMismatchException is thrown:

o[0] = new object(); // throws

If the runtime performs defensive checks upon writing to an object[] (in the general case, at least), why didn’t it fire up an exception when we used the transmuted List<object> to add a non-string?

e.Add(item);

Transmuting the List<string> to a List<object> reinterprets the string[] as an object[]. But the actual type of the array (string[]) is left intact, because it was instantiated back when the list was List<string>.

It seems that adding a non-string object to that string[] array should have triggered a runtime type check followed by an ArrayTypeMismatchException. Why, then, was no exception thrown?

Notice that I sneakily used ToList() in my initial example, in order to create a list with a capacity that matches the number of actual elements in it. This means that the initial string[] array was full as soon as the list was constructed.

When we attempted to add a new element (be it a string or another kind of object), we forced the list to resize — in other words, to allocate a new backing array and copy all existing elements to it. The creation of this new backing array was actually influenced by the transmutation of the list, meaning that the backing array is now of type object[].

In other words, the reason we didn’t get an ArrayTypeMismatchException  is because we never added the non-string object to the string[] array. As soon as we attempted to add something to the list, the backing string[] array got replaced by an object[] array — and all that happened before any type-incompatible write was attempted to the original array.

To verify this hypothesis, we can create a list whose capacity explicitly exceeds the number of stored elements:

List<string> oldList = new List<string>(16);
oldList.AddRange(new[] { "one", "two", "three" });

Once we’ve ensured that the existing backing array will be reused on our next attempt to add an item, we can confirm that the exception actually gets thrown from this method call:

mutList.Add<string, object, object>(new object());

Specifically, this line:

e.Add(item); // throws ArrayTypeMismatchException

So it was a fluke that our initial experiment seemed to work. We actually need to do one more thing before it works properly — we need to ensure not only that the list gets transmuted, but also that the backing array gets transmuted as well. To receive a reference to the backing array, the most obvious way is to use reflection:

typeof(List<TResult>)
    .GetField("_items", BindingFlags.NonPublic | BindingFlags.Instance)
    .GetValue(e)
    .TransmuteTo(new TResult[0]);

And now it works.

Full listing

public unsafe static class Transmute
{
    public static unsafe void* GetObjectAddress(this object obj) => *(void**)Unsafe.AsPointer(ref obj);

    public static void TransmuteTo(this object target, object source)
    {
        if (target.GetType() == source.GetType()) return;

        var s = (void**)source.GetObjectAddress();
        var t = (void**)target.GetObjectAddress();
        *t = *s;

        if (target.GetType() != source.GetType())
            throw new AccessViolationException($"Illegal write to address {new IntPtr(t)}");
    }

    public static T TransmuteTo<T>(this object target, T source)
    {
        target.TransmuteTo((object)source);
        return (T)target;
    }

    public static T TransmuteTo<T>(this object target) where T : new() => target.TransmuteTo(new T());
}

public static class Weird
{    
    public static List<TResult> Add<TSource, TItem, TResult>(this List<TSource> list, TItem item)
        where TSource : class, TResult
        where TItem : class, TResult
    {
        var e = list.TransmuteTo<List<TResult>>();
        typeof(List<TResult>)
            .GetField("_items", BindingFlags.NonPublic | BindingFlags.Instance)
            .GetValue(e)
            .TransmuteTo(new TResult[0]);
        e.Add(item);
        return e;
    }
}

A small artifact

If we behave like good citizens and forget the reference we had to oldList, then the program will continue to work. If we keep using oldList, we can observe something slightly strange.

Consider this code:

List<string> oldList = new List<string>();
List<object> newList = oldList.Add<string, object, object>(new object());
 
Console.WriteLine(ReferenceEquals(oldList, newList)); // prints True

Console.WriteLine(oldList is List<object>);           // prints False
Console.WriteLine(oldList is List<string>);           // prints True

Console.WriteLine(newList is List<object>);           // prints True
Console.WriteLine(newList is List<string>);           // prints False

After transmuting the list, we can verify that oldList and newList are the exact same object. Yet when we ask them the same questions (through the is operator), we get opposing answers, depending on the type of the expression we used to get them.

The explanation for this is fairly obvious. The C# compiler knows enough about the type system to reason that:

  • Under no circumstances could oldList, a List<string> variable, contain a List<object> , because those types are incompatible. Therefore, the expression oldList is List<object> can be optimized to false at compile time.
  • And because the variable oldList is of type List<string>, anything it contains must be either a valid List<string> or a null reference. Therefore, the expression oldList is List<string> can be optimized to oldList != null at compile time.

Of course, since we have violated the type system, the assumptions made by the above statements don’t actually hold. That’s why the results that come out are the opposite of the truth.

When compiler optimizations interact with constructs that invoke undefined behavior, weird things can happen — sometimes the weirdness is contained, other times it is propagated or even amplified. This phenomenon is traditionally associated with C and C++, because undefined behavior is much more pervasive and easier to trigger in those languages. But even a mostly safe language, such as C#, is not exempt from these effects if we really put in the effort to trigger them.

Epilogue

This was a fun little experiment. But beware that as soon as you use this in a project, you will be opening the floodgates of C and C++ levels of unsafety into .NET, a platform where most code generally operates under the assumption that safety is very much on.