It's probably easier to understand if you forget all that "pass by reference" vs "pass by value" - they're poor terms of phrase, as you're finding out because they tend to make you conceive that the Book memory data is either being copied when it is passed to the method, or the original is passed. "Pass by copy of reference" and "pass by original reference" might be better - class instances are always passed by reference
I find it more helpful to think of nearly every variable in a program as being a reference in its own right, and "pass by value/reference" refers to whether a new reference is created or not when calling a method.
So you have your line:
Book book1 = new Book("Fight club");
Immediately after we execute this line, there is one variable name in your program, book1
, and it refers to some block of data at memory address 0x1234 that contains "Fight club"
ChangeBookName(book1, "The Wolfman");
We call the ChangeBookName method and c# establishes another reference, called book
, because that is what it says in the method signature, also pointing to address 0x1234.
You have two references, one block of data. The book
reference will be lost when the method ends- it's lifetime is only between the { } of the method
If you use this additional book
reference to change something about the data:
book.Name = "The wolfman";
Then the first reference, book1
will see the change- it points to the same data, the data changes.
If you point this additional book
reference to a whole new block of data elsewhere in memory:
book = new Book("The wolfman");
You now have two references, two blocks of data - book1 points to "fight club" at 0x1234, and book points to "the wolfman" at 0x2345. The wolfman data and the book
reference will be lost when the method ends
The crucial point here about having two references to one block of data is that you can change some property of the data and both references see it? But if you point one of the references to a new block of data the original reference remains pointing to the original data
If you want a method to be able to swap out the block of data for a whole new block of data and also have the original reference experience the change, you use the ref
keyword. Conceptually this causes C# not to make a copy of the reference at all, but reuse the same reference (albeit with a different name)
void ChangeBookForANewOne(ref Book tochange){
tochange = new Book("Needful things");
}
Book b = new Book("Fight club");
ChangeBookForANewOne(b);
All through this code there is only one reference to one block of data. Changing the block of data for a new one inside the method causes the change to be remembered when the method exits
We seldom do ref; if you want to change your book for a new one you should really return it from the method and change reference b to be the newly returned book. People use ref when they want to return more than one thing from a method but really that's an indicator that you should be using a different class as the return type
The same notions are true for value types (usually primitive things like int) but the slight difference is that if you pass an int to a method then you end up with two variable names but also two ints in memory; if you increment the int inside the method the original int doesn't change because the additional variable established for the lifetime of the method call is a different data in memory - the data really is copied and you have two variables and two numbers in memory. Ref style behaviour is more useful and more common for things like this, with things like int.TryParse - it returns a true or false indicating whether parsing succeeded but in order to return the parsed value to you it needs to use the original variable you passed in, not a copy of it.
To do this, TryParse uses a variation of ref
called out
- a marker on a method variable that indicates "this method will definitely assign a value to the variable you pass in; if you were to give it a variable already initialized to a value it would definitely be overwritten". In contrast, ref indicates "you can pass a variable in that is initialized to a value, I might use the value and I might overwrite it/point it to a new data in memory". If you have a method that doesn't need to take a value but definitely overwrites, like my ChangeForANewBook before, you should really use out
- in 100% of cases ChnageForANewBook overwrites what was passed in, which could cause unintended data loss for a developer. Marking it as out
would mean C# would make sure only blank references are used/passed in, helping prevent unintended data loss