2

From what I understand by now, I can say that a reference in C# is a kind of pointer to an object which has reference count and knows about the type compatibility. My question is not about how a value type is different than a reference type, but more about how a reference is implemented.

I have read this post about what differences are between references and pointers, but that does not cover that much about what a reference is but it it's describing more it's properties compared with a pointer in C++. I also understand the differences between passing by reference an passing by value (as in C# objects are by default passed by value, even references), but it is hard for me to understand what really is a reference when I have tried to explain to my colleagues why a parameter sent by reference can not be stored inside a closure as in the Eric Lippert blog entry about the stack as an implementation detail.

Can somebody provide me with a complete, but hopefully simple explanation about what references really are in C# and a bit about how they are imlemented?

Edit: this is not a duplicate, because in the Reference type in C# it is explained how a reference works and how is it different of a value, but what am I asking is how a reference is defined at a low level.

Community
  • 1
  • 1
meJustAndrew
  • 6,011
  • 8
  • 50
  • 76
  • Possible duplicate of [Reference type in C#](http://stackoverflow.com/questions/18229463/reference-type-in-c-sharp) – techvice Nov 18 '16 at 22:17
  • 1
    If you think there's anything going on with reference counting, **you don't understand**. – Ben Voigt Nov 18 '16 at 22:59
  • Concerning "when I have tried to explain to my colleagues why a parameter sent by reference can not be stored inside a closure" the important difference between a variable of reference type and a byref parameter is that the first one influences the lifetime of what it points to and the second one does not. There's lots of ways they are similar (they hold an address, they will be automatically adjusted if the GC performs a heap compaction) but those aren't so important for your particular point. – Ben Voigt Nov 18 '16 at 23:02
  • @BenVoigt It's possible for a `ref` parameter to reference a boxed value type, is it not? If that happens, it's possible for that `ref` parameter to be the only remaining reference, and if that doesn't cause the lifetime to be extended, things are going to behave very badly. –  Nov 18 '16 at 23:11
  • @hvd: No, it isn't. The `byref` parameter references the variable in (or known to, it might be a class member) the caller which I suppose has type `object` (a reference type). That variable keeps its target object alive. If the variable was a member of a heap object, the caller must have had a reference to that object, keeping it (and its member variable) alive. The `byref` parameter itself doesn't keep anything alive. – Ben Voigt Nov 18 '16 at 23:29
  • @BenVoigt The caller must have had a reference to that object, but that reference may be cleared during the call. Consider http://pastebin.com/ak2baTvh. Try it in release mode. With the `Console.WriteLine(s_);` commented, the `WeakReference` shows that `s` got garbage collected. With `Console.WriteLine(s_);` uncommented, the `WeakReference` shows that `s` remains alive longer. –  Nov 18 '16 at 23:38
  • 1
    I would be careful to not conflate `ref` variables and references to objects. They are conceptually quite different. In C# a reference to an object refers to *an object as a whole*, and a `ref` variable is *an alias for another variable*. You can tell they are different conceptually because C# permits different operations on them. You can take two references to `object` and call `ReferenceEquals` on them to determine if they are referring to the same or different object. But there is no way in C# to determine if two `ref` params refer to the same variable. – Eric Lippert Nov 18 '16 at 23:38
  • @Eric Lippert thanks for the comment, I never thought this way about ref parameters as they won't have references equal, I find this pretty interesting. I want to thank you for the great answer, I am sorry that regarding your last question within your answer I have to say that thinking about it I am not able to provide an answer as I would not be able to create this kind of system but it helped me get an idea about how complex should be the entire mechanism of references and garbadge collection. – meJustAndrew Nov 18 '16 at 23:53
  • also @Eric Lippert, just to clarify, the `ref` and references to objects are not conceptually the same thing, but tehnically, as an implementation detail, are they the same? – meJustAndrew Nov 18 '16 at 23:58
  • @hvd: Ahh, you're making a byref parameter to the value type in the box, and it's the last thing keeping the boxed value alive. I was thinking you meant a byref parameter to the reference to the boxed value. Anyway, you're right, the caller's reference doesn't keep it alive, the byref parameter does, and I'm right, boxing is not a special case, just a particular case of byref to variable inside a gc-heap object. Consider http://rextester.com/DTAQ3945 which exhibits the exact same behavior with neither boxing nor value types. – Ben Voigt Nov 19 '16 at 00:21
  • @BenVoigt Right, I realised that afterwards too and thought of a `ref` to a class's field as another simpler example. I think we're in agreement now. –  Nov 19 '16 at 00:27
  • @meJustAndrew: That's complicated; too complicated for a comment. My advice to you is that if you want to understand this stuff, that you approach in by going down one level of abstraction at a time. The C# type system is layered on top of the Common Type System of the CLI, which makes the relationships between object, interface, managed pointer and unmanaged pointer types very clear; many of these concepts are abstracted away in the C# type system. Chapter 8 of Partition I of the CLI spec should prove interesting reading for you. – Eric Lippert Nov 19 '16 at 00:31

2 Answers2

9

From what I understand by now, I can say that a reference in C# is a kind of pointer to an object

If by "kind of" you mean "is conceptually similar to", yes. If you mean "could be implemented by", yes. If you mean "has the is-a-kind-of relationship to", as in "a string is a kind of object" then no. The C# type system does not have a subtyping relationship between reference types and pointer types.

which has reference count

Implementations of the CLR are permitted to use reference counting semantics but are not required to do so, and most do not.

and knows about the type compatibility.

I'm not sure what this means. Objects know their own actual type. References have a static type which is compatible with the actual type in verifiable code. Compatibility checking is implemented by the runtime's verifier when the IL is analyzed.

My question is not about how a value type is different than a reference type, but more about how a reference is implemented.

How references are implemented is, not surprisingly, an implementation detail.

Can somebody provide me with a complete, but hopefully simple explanation about what references really are in C#

References are things that act as references are specified to act by the C# language specification. That is:

  • objects (of reference type) have identity independent from the values of their fields
  • any object may have a reference to it
  • such a reference is a value which may be passed around like any other value
  • equality comparison is implemented for those values
  • two references are equal if and only if they refer to the same object; that is, references reify object identity
  • there is a unique null reference which refers to no object and is unequal to any valid reference to an object
  • A static type is always known for any reference value, including the null reference
  • If the reference is non-null then the static type of the reference is always compatible with the actual type of the referent. So for example, if we have a reference to a string, the static type of the reference could be string or object or IEnumerable, but it cannot be Giraffe. (Obviously if the reference is null then there is no referent to have a type.)

There are probably a few rules that I've missed, but that gets across the idea. References are anything that behaves like a reference. That's what you should be concentrating on. References are a useful abstraction because they are the abstraction which enables object identity independent of object value.

and a bit about how they are implemented?

In practice, objects of reference type in C# are implemented as blocks of memory which begin with a small header that contains information about the object, and references are implemented as pointers to that block. This simple scheme is then made more complicated by the fact that we have a multigenerational mark-and-sweep compacting collector; it must somehow know the graph of references so that it can move objects around in memory when compacting the heap, without losing track of referential identity.

As an exercise you might consider how you would implement such a scheme. It builds character to try to figure out how you would build a system where references are pointers and objects can move in memory. How would you do it?

it is hard for me to understand what really is a reference when I have tried to explain to my colleagues why a parameter sent by reference can not be stored inside a closure

This is tricky. It is important to understand that conceptually, a reference to a variable -- a ref parameter in C# -- and a reference to an object of reference type are conceptually similar but actually different things.

In C# you can think of a reference to a variable as an alias. That is, when you say

void M() 
{
  int x = 123;
  N(ref x);
}
void N(ref int y)
{ 
    y = 456;

Essentially what we are saying is that x and y are different names for the same variable. The ref is an unfortunate choice of syntax because it emphasizes the implementation detail -- that behind the scenes, y is a special "reference to variable" type -- and not the semantics of the operation, which is that logically y is now just another name for x; we have two names for the same variable.

References to variables and references to objects are not the same thing in C#; you can see this in the fact that they have different semantics. You can compare two references to objects for equality. But there is no way in C# to say:

static bool EqualAliases(ref int y, ref int z)
{
  return true iff y and z are both aliases for the same variable
}

the way you can with references:

static bool EqualReferences(object x, object y)
{
  return x == y;
}

Behind the scenes both references to variables and references to objects are implemented by pointers. The difference is that a reference to a variable might refer to a variable on the short-term storage pool (aka "the stack"), whereas a reference to an object is a pointer to the heap-allocated object header. That's why the CLR restricts you from storing a reference to a variable into long-term storage; it does not know if you are keeping a long-term reference to something that will be dead soon.

Your best bet to understand how both kinds of references are implemented as pointers is to take a step down from the C# type system into the CLI type system which underlies it. Chapter 8 of the CLI specification should prove interesting reading; it describes different kinds of managed pointers and what each is used for.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • Side note: the CLR garbage collector is capable of updating both references and pointers to (or into) gc-heap objects that get moved. The C# language doesn't support tracking pointers (so targets must be pinned), but other .NET languages do. – Ben Voigt Nov 18 '16 at 23:34
0

References in C# are very similar to C++ references. Yes, indeed, underneath there is garbage collection magic going on, but I would say how that works is a different and larger topic.

C# references are similar to C++ references/immutable pointers: No pointer arithmetic, etc - but you can reassign them (Thanks Ben!).

I'd say in practice, one difference is that since pointers aren't generally available in C# (unsafe keyword and its associated pointers is again a different and larger topic) , you'll find yourself using "out" keyword to do what pointer-to-pointer used to do.

Also you are correct in asserting references carry type information. All references in C# come from the Object class, which itself has GetType() method.

Be advised, however, structs - which are generally treated as value, not reference - also have GetType().

Malachi
  • 2,260
  • 3
  • 27
  • 40
  • 2
    C# references aren't immutable! (Big difference from C++ references, that) And they can be `null` without violating any invariants (another big difference from C++). – Ben Voigt Nov 18 '16 at 23:35
  • Thanks Ben! Updated my response to reflect your correction – Malachi Nov 19 '16 at 00:17
  • @BenVoigt: It's a bit confusing, because C++ has in many ways a different approach to references and memory management than C# does, obviously. The right analogy to make I think is that a C++ reference is less like a reference to an object in C# than it is like a `ref` variable in C#. And those *are* immutable in C#; when you say `M(ref x)` and we have `void M(ref int y)` then `y` becomes an alias for `x`, and there is no way to change that inside `M`, to make `y` an alias for something else. – Eric Lippert Nov 19 '16 at 00:24
  • @EricLippert: like a `ref` variable in C# 7, yes. Before return types and locals were added, not so much. Although beware that in C++ a reference, like a pointer, is an alias to a location and not any particular object. It's perfectly legal (given some conditions) to replace the referred-to object with a different one of the same type in the same location. – Ben Voigt Nov 19 '16 at 00:33
  • @EricLippert, you probably already know this but our readers may not: Your simple statement suggests the right conclusion, but in C++ where a distinction is made between initialization and assignment, and object lifetime is eager and deterministic, the journey to reach that conclusion is much longer. – Ben Voigt Nov 19 '16 at 00:49