0

I am trying to write some code to create a linkedlist but I am confused on how the pass by reference is working in c#. Below is my code for AddNodeToEnd method which is taking as input the Head of the LinkedList and the data element to add.

    public LinkedList AddNodeToEnd(LinkedList head, string data)
    {
        var node = new LinkedList() { Data = data };
        if (head == null)
            return node;

        while (head.Next != null)
        {
            head = head.Next;
        }
        head.Next = node;

        return head;
    }

Below is my code for adding elements to the list.

    var linkedList = new LinkedListDriver();
    var head = linkedList.AddNodeToEnd(null, "1");
    linkedList.AddNodeToEnd(head, "2");
    linkedList.AddNodeToEnd(head, "3");
    linkedList.AddNodeToEnd(head, "4");
    linkedList.AddNodeToEnd(head, "5");
    Console.Write(linkedList.PrintList(head));

This is printing the output as 1 => 2 => 3 => 4 => 5 (as expected).

My question is how the head element is getting changed in the AddNodeToEnd method? New Nodes are getting added to the LinkedList correct, but in same method, I am traversing on the Head node itself without assigning it to a different/temporary variable but in my Main method, still the Head remains on 1 why? Based on above code, I was expecting the Head would move to 5 because of head = head.Next; as the head is passed as reference (by default in C#).

Appreciate an explanation of this behavior in C#.

public class LinkedList
{
    public string Data { get; set; }
    public LinkedList Next { get; set; }
    public LinkedList Previous { get; set; }
}
WAQ
  • 2,556
  • 6
  • 45
  • 86
  • 1
    The local `head` variable is initialised in the line `var head = linkedList.AddNodeToEnd(null, "1");`. In subsequent calls, the local `head` variable is not changed, but it's `Next` property is. – Matthew Watson May 30 '22 at 07:58
  • 4
    "the head is passed as reference" - it's worth understanding the difference between "reference passed by value" (the default) and "pass by reference" (ref and out parameters"). See https://jonskeet.uk/csharp/parameters.html – Jon Skeet May 30 '22 at 08:10
  • 1
    First up, the name `LinkedList` is confusing, because the class actually represents the node of a linked list. Secondly, this is a non standard (and therefore somewhat overly complex) way to create a linked list. Please lookup some examples and good tutorials online. Don't reinvent the wheel, add you will fall into the same potholes many have done before you. For instance, you are not updating the link to the previous node. – JHBonarius May 30 '22 at 08:15
  • 1
    You can read [this](https://stackoverflow.com/a/59096091/7085070) answer it may help you to understand the difference between passing by value and passing by reference – OlegI May 30 '22 at 08:26

3 Answers3

2

When you pass head to the AddNodeToEnd you are passing a copy of the reference. The copy initially points to the head but in AddNodeToEnd you change the reference that the copy points to. You are not changing the original head.

Had the signature of AddNodeToEnd been public LinkedList AddNodeToEnd(ref LinkedList head, string data) then your code would have behaved in the way you thought it should have. In this case you would have not passed a copy of the reference, you would have passed the reference itself.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
1

When you pass a reference type (i.e. a custom type/class, not a build in type/struct like int) as an argument to a function, you are actually getting a pointer to a memory location of an existing object. Unless you pass with the extra ref argument, you are not allowed to replace the whole object, but you are allowed to modify its internal values.

In this case something interesting is happening: by passing head you are given a reference type, which you can modify. Thus, you modify the existing object. However, when you assign to head you replace a locally copied reference to the object. That's doesn't modify the existing object.

JHBonarius
  • 10,824
  • 3
  • 22
  • 41
1

You are confusing two concepts which are instead different concepts. The concepts I'm referring to are the followings:

  • the difference between value types and reference types
  • the difference between passing arguments by value and passing arguments by reference

Let's start with value type VS reference type.

Value types are types whose value is the data itself. When I say "tha data itself", I mean an actual instance of the value type. An example of a value type is the struct named System.Int32, which is a 32 bit signed integer number. Consider the following variable declaration:

int number = 13;

In this case the number variable contains the actual value 13, which is an instance of the System.Int32 type. Put another way, the memory location in your computer memory you access via the number identifier directly contains the integer number 13.

Based on what I explained above, the following lines of code create a copy of the value contained inside the a variable and assign the copy to the b variable. After the execution of the code the two variables contain indipendent copies of the same integer value (13). Put another way, there are two indipendent memory locations in your RAM which contain two indipendent copies of the integer number 13:

int a = 13;
int b = a;

Reference types are types whose value is a reference to the data itself. When I say "a reference to the data itself", I mean a reference to an instance of the type. An example of a reference type is the class System.String, which is used to represent string of characters.

The lifetime of reference type instances is handled by the garbage collector, which is in charge of doing the deallocation of the memory occupied by instances of the reference types. This is done when these instances are no more referenced by anyone, so they can safely be removed from memory.

Consider the following line of code:

string name = "Enrico";

Here the variable name does not contain the string "Enrico", instead it contains a reference to the string "Enrico". This means that somewhere in the computer memory there is a memory address containing the actual data (the sequence of characters composing the string "Enrico") and that the memory location you access via the name identifier contains a reference to the memory location containing the actual string data. You can imagine the thing I'm calling a reference, as a fictious arrow (a pointer) which points to another memory location, which actually contains the sequence of characters composing the string "Enrico".

Consider the following code:

string a = "Hello";
string b = a;

This is what happens here:

  1. some memory is allocated to contain the sequence of characters composing the string "Hello". At this memory location there is the real data, the string itself, the actual instance of the System.String type.
  2. the variable a contains a pointer which points to the real data, that is a pointer to the memory location described at step 1.
  3. the variable b contains a copy of the content of variable a. This means that the variable b contains a pointer pointing to the memory location described at step 1. There are now 2 indipendent pointers to the same memory location, which contains the actual data, that is the sequence of characters composing the string "Hello".

Notice that, at this point, you can access the same instance (the string "Hello", which is an instance of the System.String type), by using two different pointers: both the a and b variables are referencing the string data stored somewhere in computer memory. The very important part here is that there is only one string instance in memory.

We can now talk about pass by value and pass by reference. Simply put, by default in C# all the method arguments are passed by value. This is the most important part of my answer and I have noticed many times some confusion about this. But it's really that simple: unless you specify that you want a pass by reference behavior, the default behavior of C# is passing method arguments by value. You can opt out by this default behavior by using the ref or the out keywords: this way you decide that you do want a pass by reference behavior when you pass arguments to a method.

Passing by value means passing a copy of the value to the method as an argument.

What is really important to understand is what "a copy of the value" actually means. But you already know the answer:

  1. a copy of the value for a value type, means a copy of the actual data representing the instance of the type
  2. a copy of the value for a reference type, means a copy of a reference to the actual data. You are not creating a copy of the actual object stored in memory, you are creating a copy of a pointer to that object.

Now we can consider the final example, that I hope will clarify your doubt. I need a class (which is a reference type) and a couple of methods.

public class Person 
{
  public string Name { get; set; }
}

public static void DoSomething(Person person) 
{
  person = new Person 
  {
    Name = "Bob"
  };

  Console.Writeline(person.Name); // this prints Bob
}

public static void Main(string[] args) 
{
  Person alice = new Person 
  {
    Name = "Alice"
  };

  DoSomething(alice);

  Console.Writeline(alice.Name); // this prints Alice
}

Here is what happens:

  1. the Main method creates an instance of the Person class in the computer memory, whose name is "Alice". A variable named alice is assigned to that instance, so the variable alice contains a pointer to the Person class instance. There is 1 variable and 1 object in memory. The variable points to the object.
  2. the DoSomething method is invoked and the variable alice is passed by value as the argument for the person parameter. The variable person is a copy of the variable alice: these two copies are independent and both of them point to the same memory location, which contains the object created at point 1 (the Person class instance whose name is "Alice").
  3. inside the method DoSomething a new object is created in memory, this object is an instance of the Person class having the name "Bob". The method parameter, person, is assigned the newly created object. There are now two objects in memory, both of them are instances of the Person class. The parameter person contains a reference to one of these objects (the one having name "Bob"), while the variable alice of the Main method contains a reference to the other object (the one having name "Alice"). This is perfectly fine because there is no bound between the parameter person and the variable alice: they are totally independent and they are free to reference different objects.
  4. when the execution of DoSomething ends, the method parameter person goes out of scope and is no more accessible via code. We are back to the Main method and the variable alice is still in scope and accessible via code. This variable has not been modified by the execution of DoSomething and keeps pointing to the instance of the Person class created at point 1 (the one having name "Alice").
Enrico Massone
  • 6,464
  • 1
  • 28
  • 56
  • 1
    Reference types are not necessarily stored on the heap. – Enigmativity May 30 '22 at 09:07
  • 1
    The variable for a value-type is typically is a reference to the position in the stack or a position in a class/struct that contains the value type. To say a value-type is the value itself is fine, but the reference type is also value that typically sits in the stack - and that value is effectively a pointer to the heap. – Enigmativity May 30 '22 at 09:13
  • Hello @Enigmativity do you have any reference / article / docs about the fact that reference types are not necessarily in the heap ? Actually this is an implementation detail not fundamental in order to understand the topic of the question. That said I do not want to spread false claims, so thanks for the comment. – Enrico Massone May 30 '22 at 09:29
  • 2
    https://ericlippert.com/2009/04/27/the-stack-is-an-implementation-detail-part-one/ – Enigmativity May 30 '22 at 10:20
  • 2
    https://ericlippert.com/2009/05/04/the-stack-is-an-implementation-detail-part-two/ – Enigmativity May 30 '22 at 10:25
  • @Enigmativity thanks for lettimg me reflect deeply on the subject. As pointed out by Eric Lippert, where the runtime actually decides to allocate instances of types is an implementation detail. This implementation detail can change over time. The important point is actually the different semantic of values types and reference types in terms of variable assignment and copy. I'll remove any reference to the managed heap from my answer to avoid spreading misconceptions to future and present readers – Enrico Massone May 30 '22 at 10:53
  • @EnricoMassone thank you for long & detailed answer. I get your point & understand the concept but I still don't understand 1 thing from my code. Appreciate if you can help me understand. In `AddNodeToEnd` method, I call `head = head.Next;` this is occurring on the local variable (head) which is passed as a Value. Above call (`head = head.Next;`) is not make any change to the `head` variable in my Main method (as expected). But similarly next line `head.Next = node;` is also making change to the same local variable but this change is reflecting in the `Main` method's variable `head`. Why? – WAQ May 30 '22 at 12:51
  • 1
    In the instruction head.Next = node; you are not changing the value of a local variable. You are instead modifying the object which is referenced by your local variable. The object stored in your computer memory is being modified. The local variable in the Main method is referencing the very same object, so you can see this change from the Main method as well. – Enrico Massone May 30 '22 at 12:59