C# has two keywords called ref and out. They are not exactly the same, but they both offer the ability to pass a reference to a local variable or a field. At the binary level, what is passed is essentially a pointer.

This feature is useful, but it has certain technical limitations. We will explore the possibility of constructing an alternative that does not suffer from the same limitations.

Limitations of ref

Since C# is a mostly memory safe language, a ref to an object cannot be allowed to outlive the object itself. Unless a language has an exotic type system like that of Rust, the compiler cannot statically prove that an object outlives all of its references in the general case. This means that C# ref has to be very limited – and like any limited feature in any language, it plays strangely with others.

For example, ref is part of a method’s signature, in the sense that an int and a ref int are different kinds of parameters, and methods taking them can co-exist as overloads:

1
2
void M(int x) { }
void M(ref int x) { }

The same applies for int and out int:

1
2
void M(int x) { }
void M(out int x) { x = 0; }

But ref and out cannot co-exist as overloads:

1
2
void M(ref int x) { }
void M(out int x) { x = 0; } // error

In a way, this is understandable – both parameter types compile to the IL type Int32& (that’s C++ notation for “reference to an Int32”) and IL code uses a method’s signature (including the return type, incidentally) to distinguish between overloads. But it’s also strange that this implementation detail resurfaces in the C# language, when a workaround could have been used (e.g. mangling the IL name of either of the two methods).

Another limitation was that ref and out cannot qualify the return type of a method:

1
2
3
4
ref int N(ref int x) // error
{
    return x;
}

Again understandable, since in the general case the compiler can’t prove that this ref will not point to an object that will go out of scope at the end of the method. Perhaps one day it will allow it for special cases such as the above which are provably safe (as long as the code passing the ref to this method is also proved safe). Update: Starting with C# 7.0, the compiler can indeed allow ref return types for provably safe cases.

Needless to say, ref and out cannot qualify the type of a field:

1
2
3
4
class X
{
    ref int V; // error
}

This is because the enclosing object could be used to haul a reference farther than it’s safe (farther than the referent object’s point of death).

Obviously ref and out parameters cannot be captured inside lambdas:

1
2
3
4
Func<int> P(ref int x)
{
    return () => x; // error
}

The reason for this is the same as the reason why ref cannot be used as a field: the lambda would need to hold a reference to the variable and it could outlive the variable itself.

I am generally not a fan of features that are limited in so many different ways or need so many considerations. Now don’t get me wrong – ref and out make life much easier when interoperating with unmanaged code and for that they are invaluable. For all other cases, however, I’d like to have an equivalent feature that waives all those limitations.

Where would such a feature be useful? Anything that would require you to expose a member through an interface could use such a ref feature as an alternative. Instead of handing out references to the enclosing object as an instance of a certain interface, you could instead start handing out references directly to a particular member. You could forgo the declaration of an interface altogether, which can sometimes be a good thing, because you may not want to give unconstrained public access to the member, as interfaces force you to do. In other words, such a feature could actually enhance encapsulation in certain circumstances.

To that end, I decided to use current features of C# to attempt to implement such a feature. It turns out it’s almost too easy to get it working.

Creating a Ref<T> type

As users of the language, we are obviously not allowed to define qualifiers for types. But we can define the next best thing: generic types.

We are going to define a simple type called Ref<T> (which could be either a struct or a class) that will represent a reference to an expression of type T:

1
2
3
public struct Ref<T>
{
}

To follow .NET conventions established by types like Nullable<T>, I’m deciding that the canonical way to read the value from the referent location or write a new value to that location will be through a property named Value:

1
2
3
4
public struct Ref<T>
{
    public T Value { get; set; }
}

By doing this, I’ve recognized the two fundamental operations I’m interested in. You see, I don’t really need to capture the location of a local variable or a field or a get/set property or an array slot. I only need it to look like I have.

I want to have two operations, specifically: get a value from the referent and set a value to the referent. I don’t know exactly how these operations would work (as that depends on the variable) but I know one returns a T and the other takes a T. In terms of delegate types, getting the value is a Func<T> and setting the value is an Action<T>. Therefore, our Ref<T> type will basically be a thin wrapper around two such delegates — one that gets the value (“getter”), one that sets it (“setter”):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
public struct Ref<T>
{
    public T Value
    { 
        get { return _get(); }
        set { _set(value); }
    }

    private readonly Func<T> _get;
    private readonly Action<T> _set;

    public Ref(Func<T> get, Action<T> set)
    {
        _get = get;
        _set = set;
    }
}

We expect these delegates to be provided by the code that constructs the Ref<T> in the form of lambda expressions that will denote which expression will be referenced:

1
2
3
4
5
static Ref<int> M()
{
    int x = 5;
    return new Ref<int>(() => x, value => { x = value; });
}

Since Ref<T> is a regular type and we haven’t used any unsafe constructs, we can safely assume that we can return it, store it in fields or properties, put it in arrays and out of them again and it will always work, no matter what. Suddenly, all the problems of ref about references outliving the referent object are not a problem anymore. How so?

Well, it’s some magic that the C# compiler does in the presence of a lambda. You see, when a local variable is captured inside the lambda, the compiler hoists that variable into a closure class, decoupling its lifetime from the lifetime of the enclosing method:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class Closure
{
    public int x;

    public int Get_x() => x;
    public void Set_x(int value)
    {
        x = value;
    }
}

static Ref<int> M()
{
    Closure c = new Closure { x = 5 };
    Ref<int> r = new Ref<int>(c.Get_x, c.Set_x);
    return r;
}

This frees the compiler from having to statically worry about the lifetime of that local variable. It’s now the responsibility of the garbage collector, which has access to dynamic information. And the GC doesn’t care about lambdas – it will simply perform its normal operation and it will consider the closure instance to be alive, as long as it is reachable through the delegates, and those delegates are reachable through our Ref<T>.

In other words, the compiler’s magic is what makes our Ref<T> work for all the cases where a simple ref couldn’t work.

Syntax improvements

The construction syntax for our Ref<T> is cumbersome right now. There are two problems: one is that we have to pass the generic type argument explicitly and the other is that we need to pass two lambdas.

1
new Ref<int>(() => x, value => { x = value; });

The first problem is easy to fix – we simply perform the construction through a generic static method of a non-generic static class:

1
2
3
4
5
6
7
public static class Ref
{
    public static Ref<T> Of<T>(Func<T> get, Action<T> set)
    {
        return new Ref<T>(get, set);
    }
}

Now the syntax is slightly more pleasant:

1
Ref.Of(() => x, value => { x = value; });

But it’s still a long way from ideal. Ideally, we’d like to have to pass only one of the two lambdas and the other one should be inferred. Out of the two lambdas, the getter seems more pleasant to type, so we’d like our method to receive that and somehow construct the other:

1
Ref.Of(() => x);

The Ref.Of<T>() method’s signature would have to look something like this:

1
2
3
4
5
6
7
public static class Ref
{
    public static Ref<T> Of<T>(Func<T> get)
    {
        return new Ref<T>(get, null);
    }
}

Unfortunately, we can’t implement this Ref.Of<T>() method in any sane way. A delegate is simply a reference to a precompiled method (and optionally an object to be passed as this, but that’s not relevant right now). There is no way to examine the Func<T> and infer the corresponding Action<T>, because code is not data – at least not from our method’s perspective.

What we’d like to have is a way to tell the user to pass the expression to us as data instead of as a precompiled delegate. And we need to do that while still retaining the lambda syntax because the compiler must continue hoisting the local variable to the place in memory where GC-managed objects go.

But C# does offer a way to pass an expression as data to be examined and manipulated: expression trees!

Expression trees are assigned just like delegates, but when the compiler sees that the type of the lambda is an expression tree, it will not compile the lambda to IL. Instead, it will create a tree-like representation of the expression and pass that. That representation can later be traversed, examined, and even compiled to an actual delegate if we want. This sounds exactly like our use case.

Incidentally, many LINQ providers work with expression trees. This allows them to examine the expressions passed to them, understand them and translate them to something else (for example, SQL queries).

So, to begin using expression trees, let’s change the signature of our Ref.Of<T>() method:

1
2
3
4
5
6
7
public static class Ref
{
    public static Ref<T> Of<T>(Expression<Func<T>> get)
    {
        return new Ref<T>(null, null);
    }
}

One great thing about expression trees is that their syntax is the same as that of the corresponding delegates, so the author of the calling code may not even notice the difference:

1
Ref.Of(() => x); // I'm still here!

The first thing we need to do in order to implement this improved Ref.Of<T>() method is to compile the expression tree to create our getter. That’s simple enough:

1
2
3
4
5
6
7
8
public static class Ref
{
    public static Ref<T> Of<T>(Expression<Func<T>> get)
    {
        var get = expr.Compile();
        return new Ref<T>(get, null);
    }
}

Next, we need to find a way to construct the setter.

Let’s get one thing out of the way: we cannot construct a setter from a getter in the general case. The caller may choose to pass an expression that can not be meaningfully set, or that doesn’t make sense to take as a Ref<T>. Examples of such expressions include:

1
2
3
Ref.Of(() => new object());
Ref.Of(() => 0);
Ref.Of(() => "Hello!");

So it’s important to accept our limitations here. First, we can’t handle all kinds of expressions, only a handful of them. Second, we can’t exclude at compile time the kinds of expressions we can’t handle, which means we’ll have to throw an exception at runtime if we receive such expressions.

These limitations may sound disappointing, but at least we’re in good company. Many LINQ providers react the same way when they are given an expression that they don’t know how to examine or translate (e.g. any C# expression that has no SQL equivalent).

Okay, so let’s get to work and start from the beginning. What things do all of our Action<T> need to have in common? They need to have a formal parameter list of exactly one parameter, of type T.

1
var param = new[] { Expression.Parameter(typeof(T)) };

As a body, they need to have the operation that will perform the assignment of the parameter to the referent location. Depending on the kind of expression we’ve received, we will have to write a specialized assignment operation. We’ll figure out the mechanism to do that later – for now, let’s just hide it behind a method that receives the getter’s body and the parameter:

1
var op = CreateSetOperation(expr.Body, param[0]);

Now that we have our assignment operation, no matter what that is, we can turn this whole thing into an expression tree of the appropriate delegate type:

1
var act = Expression.Lambda<Action<T>>(op, param);

Finally, we compile it to a callable delegate:

1
var set = act.Compile();

And that’s how we have our setter. The complete listing for now becomes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
public static Ref<T> Of<T>(Expression<Func<T>> expr)
{
    var get = expr.Compile();
    var param = new[] { Expression.Parameter(typeof(T)) };
    var op = CreateSetOperation(expr.Body, param[0]);
    var act = Expression.Lambda<Action<T>>(op, param);
    var set = act.Compile();
    return new Ref<T>(get, set);
}

internal static Expression CreateSetOperation(Expression expr, Expression param)
{
    throw new NotSupportedException("This kind of expression is not supported.");
}

Now what we need is an implementation for CreateSetOperation().

We definitely want to support references to local variables, fields, and possibly even properties:

1
2
3
Ref.Of(() => local);
Ref.Of(() => obj.Field);
Ref.Of(() => obj.Property);

We can easily find that when the user passes any of these expressions, the body of the getter lambda is a MemberExpression. For these cases, we can simply take that body and put it on the left hand side of an assignment. The right hand side of the assignment is, of course, the argument that the setter receives. Thus, we construct the body like this:

1
2
3
4
internal static Expression CreateSetOperation(MemberExpression expr, Expression param)
{
    return Expression.Assign(expr, param);
}

For the special case where a property is passed and that property has no setter, we can allow our getter to work, but our setter should do something sensible, like throw an exception. So let’s change the above snippet to add this provision:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
internal static Expression CreateSetOperation(MemberExpression expr, Expression param)
{
    var propertyInfo = expr.Member as PropertyInfo;
    if (propertyInfo != null && !propertyInfo.CanWrite)
    {
        return Expression.Throw(Expression.New(typeof(InvalidOperationException)));
    }

    return Expression.Assign(expr, param);
}

We’ll probably also want the user to be able to create a reference to a particular location in an array:

1
Ref.Of(() => array[5]);

The type of this expression is BinaryExpression, where one part of the expression is the array object and the other is the indexer. Out of these two parts, we construct an array access expression, and we assign the argument to that:

1
2
3
4
5
internal static Expression CreateSetOperation(BinaryExpression expr, Expression param)
{
    var access = Expression.ArrayAccess(expr.Left, expr.Right);
    return Expression.Assign(access, param);
}

Since we’ve done that, it would be nice if we could do the same for other kinds of containers, like lists and dictionaries:

1
2
Ref.Of(() => list[5]);
Ref.Of(() => dictionary["key"]);

Even though they all look the same, indexing an array is a fundamental operation in C#, while indexing a list or other custom container is a method call.

A get/set indexer is implemented as two methods that follow a simple convention: one contains get_ in its name, the other contains set_ at the same place. We can take advantage of that convention to find the indexer setter method, given the indexer getter method. If no indexer setter method can be found, we can throw an exception like we did before.

Once we have a valid indexer setter method, we can go ahead and invoke it on the same object as the indexer getter method was invoked on. We also need to pass as arguments whatever arguments (e.g. index, key) were already passed to the indexer getter method, so that the indexer setter method can find the same object. We also need to pass one more argument in addition to the rest – the new value we will receive as an argument to our own setter delegate.

Here’s the code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
internal static Expression CreateSetOperation(MethodCallExpression expr, Expression param)
{
    var setterName = expr.Method.Name.Replace("get_", "set_");
    var setter = expr.Method.DeclaringType?.GetMethod(setterName);
    if (setter == null)
    {
        return Expression.Throw(Expression.New(typeof(InvalidOperationException)));
    }
    else
    {
        return Expression.Call(expr.Object, setter, expr.Arguments.Concat(new[] { param }));
    }
}

Now to dispatch based on the runtime type, our general CreateSetOperation() method could look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
internal static Expression CreateSetOperation(Expression expr, Expression param)
{
    if (expr is BinaryExpression binaryExpr)  
    {
        return CreateSetOperation(binaryExpr, param);
    }
    else if (expr is MemberExpression memberExpr)
    {
        return CreateSetOperation(memberExpr, param);
    }
    else if (expr is MethodCallExpression callExpr) 
    {
        return CreateSetOperation(callExpr, param);
    }
    else
    {
        throw new NotSupportedException("This kind of expression is not supported.");
    }
}

And we’re done! You can see the complete listing of the Ref<T> struct and the Ref static class on GitHub.

Notes

One thing worth taking a moment to decide is whether Ref<T> should be a struct or a class. I made it a struct because it is intended to represent a simple value and the struct itself is immutable so I don’t expect any pitfalls in its usage. However, structs in C# always expose a default constructor that initializes all of the object’s bits to their default values. Thus, a default(Ref<T>) instance does not reference any expression. If we attempt to get or set the value of the default(Ref<T>) instance, we’ll get a NullReferenceException. I considered this a happy accident, so I decided to leave it in – even though I’m not a fan of the existence of the null reference in general.

Another thing worth mentioning is that our Ref<T> does not have the same performance characteristics as C#’s ref. This is understandable, considering that the existing ref is just a pointer to a memory location. Meanwhile, our Ref<T> is a fat struct (it contains two pointers in its memory layout) and every to the referent requires performing a method call, then following either one of those references, then performing another method call, which will follow a reference to a user-defined object or a compiler-generated closure. Don’t be surprised if it turns out to be slower than ref – it’s expected to be.

A final thing to watch out for is the fact that our Ref<T> cannot work like ref for interop scenarios. An unmanaged function that expects to see a T& cannot be given a Ref<T>, both because it has the wrong size (as we said before, it contains two pointers) and because it has the wrong content. We won’t care about this limitation today, although we might tackle it in a future post.