0

Powershell supports arguments by reference two different ways. Any variable can be passed by reference using [Ref] as the type, or basically I guess casting to the special [Ref] type. Alternatively, complex data types like [Hashtable] are always passed by reference. The former approach then requires using .value when either changing the value or passing the variable to another class method or function. In simplified form this is what that looks like in practice.

# implicit By Reference
class One_i {
    static [Void] Act ([System.Collections.Generic.List[string]] $data) {
        $data.Add("One_i::Act(): $(Get-Date)")
        [Two_i]::Act($data)
    }
}

class Two_i {
    static [Void] Act ([System.Collections.Generic.List[string]] $data) {
        $data.Add("Two_i::Act(): $(Get-Date)")
    }
}

# explicit By Reference
class One_e {
    static [Void] Act ([Ref] $data) {
        $data.value.Add("One_e::Act(): $(Get-Date)")
        [Two_e]::Act($data.value)
    }
}

class Two_e {
    static [Void] Act ([Ref] $data) {
        $data.value.Add("Two_e::Act(): $(Get-Date)")
    }
}



CLS
$data_i = [System.Collections.Generic.List[string]]::new()
$data_i.Add("Init_i: $(Get-Date)")
[One_i]::Act($data_i)
$data_i.Add("Finalize_i: $(Get-Date)")
foreach ($item in $data_i) {
    Write-Host "$item"
}
Write-Host

$data_e = [System.Collections.Generic.List[string]]::new()
$data_e.Add("Init_e: $(Get-Date)")
[One_e]::Act($data_e)
$data_e.Add("Finalize_e: $(Get-Date)")
foreach ($item in $data_e) {
    Write-Host "$item"
}
Write-Host

I am conflicted as to which approach I prefer. On the one hand [Ref] makes it clear which variables are By Ref, which is useful. However, the need for .value makes the code a bit harder to read, adds work, and creates an opportunity to forget and induce a bug. I tested performance like this...

(measure-Command {
    $data_i = [System.Collections.Generic.List[string]]::new()
    $data_i.Add("Init_i: $(Get-Date)")
    foreach ($i in 1..1000) {
        [One_i]::Act($data_i)
    }
    $data_i.Add("Finalize_i: $(Get-Date)")
}).TotalSeconds

(measure-Command {
    $data_e = [System.Collections.Generic.List[string]]::new()
    $data_e.Add("Init_e: $(Get-Date)")
    foreach ($e in 1..1000) {
        [One_e]::Act($data_e)
    }
    $data_e.Add("Finalize_e: $(Get-Date)")
}).TotalSeconds

and implicit seems to be VERY slightly faster, but not enough to make an argument for implicit based on performance alone. This leads me to a number of questions as I try to decide which approach to adopt, or perhaps the realization that each approach is the better choice in certain situations and using both in a single program actually has merit.

1: Why are complex types handled differently than simple types? The inconsistent behavior seems odd to me, and I suspect I will learn something fundament if I understand why the behavior is different.

2: Beyond needing to use .value or not, are there other differences between implicit and explicit By Reference? Especially are there any potential problems with one approach or the other?

3: Is there a mechanism to force a complex type to be By Value? And what would be the use case for that?

4: I found this thread, with a comment saying it's best to avoid By Reference (but without explanation). However, I have also found references to Dependency Injection being superior to Global variables and/or the Singleton pattern, and By Reference is fundamental to DI. Is there annoying important I need to be aware of here, ESPECIALLY as it relates to PowerShell/Classes/Dependency Injection, given that PowerShell classes are not as fully implemented as with other languages. For what it's worth I am working in PS 5.1 and will not be moving to Core any time soon, but it would be great to know if there are fundamental differences between Core and pre Core implementations.

EDIT: Based on @mclayton's answer and my understanding of "Changing value" vs "mutation" I tried "breaking" the reference like this

# implicit By Reference
class One_i {
    static [Void] Act ([System.Collections.Generic.List[string]] $data) {
        $data.Add("One_i::Act(): $(Get-Date)")
        [Two_i]::Act($data)
        $data = [System.Collections.Generic.List[string]]::new()
    }
}
# explicit By Reference
class One_e {
    static [Void] Act ([Ref] $data) {
        $data.value.Add("One_e::Act(): $(Get-Date)")
        [Two_e]::Act($data.value)
        $data.value = [System.Collections.Generic.List[string]]::new()
    }
}

I tried both $data.value = [System.Collections.Generic.List[string]]::new() and $data = [System.Collections.Generic.List[string]]::new() in the "explicit" example, and none of theme actually seemed to cause an issue. The console output was "correct" in all three cases. Which suggests that I DON'T understand the difference between changing value and mutation after all.

That said, as it relates to my initial reason for asking the question, namely deciding which of the two approaches I want to use, I am leaning towards what I call "implicit", since it does make the code a bit simpler by eliminating the need for .value, and not having the by reference nature of the variable be obvious doesn't change the fact that I have to understand that it is by reference, no matter what I do. I really wish I could find a thread that discusses the relative merits of the two approaches, but thus far my searches have failed to turn up anything useful.

Edit2: So, I found this where mklement0 says there really is no functional difference in the two approaches. And it occurred to me that I could get the best of both worlds by using [Ref] and then assigning $ref.value to a local variable, to simplify things within the method or function. Tentatively feeling like that is the best answer. It add one line at the top of each function or method, but makes the calls more obvious in behavior as well as simplifying use in the body of the method or function.

# explicit By Reference with sugar
class One_x {
    static [Void] Act ([Ref] $data) {
        $localData = $data.Value
        $localData.Add("One_x::Act(): $(Get-Date)")
        [Two_e]::Act(([Ref]$localData))
    }
}

class Two_x {
    static [Void] Act ([Ref] $data) {
        $localData = $data.Value
        $localData.Add("Two_x::Act(): $(Get-Date)")
    }
}
Gordon
  • 6,257
  • 6
  • 36
  • 89

1 Answers1

1

tl;dr

Here's the key differences between value types, reference types, passing by value and passing by reference:

Pass by value Pass by reference
Value type
  • Assignments to parameters inside a function are not persisted to outer variables
  • Assignments to parameters inside a function are persisted to outer variables
Reference type
  • Assignments to parameters inside a function are not persisted to outer variables
  • Mutations to object inside function are visible outside function
  • Assignments to parameters inside a function are persisted to outer variables
  • Mutations to object inside function are visible outside function

Loooooooong version

First off, I think the documentation for [ref] is a bit misleading, despite improvements to it in the past (see about_Ref and Improve the about_Ref topic to clarify its primary purpose).

The main problem is there's a subtle difference between "Value types vs Reference types" and "Passing by value vs Passing by reference" - the about_Ref documentation touches on it, but sort of blurs it all into one big single concept.

Value Types vs Reference Types

Ignoring [ref] for a moment, there are two fundamental types in the .Net world, and there are some similarities and differences in the way they're passed as parameters:

A variable of a value type contains an instance of the type. This differs from a variable of a reference type, which contains a reference to an instance of the type. By default, on assignment, passing an argument to a method, and returning a method result, variable values are copied. In the case of value-type variables, the corresponding type instances are copied.

With reference types, two variables can reference the same object; therefore, operations on one variable can affect the object referenced by the other variable.

The default behaviour for both value types and reference types is to pass "by value" - that is, parameters passed into a method are a copy of the variable value from the parent scope. For value types this means a full copy of the entire value (including for struct types), and for reference types it means passing in a duplicate reference to the same underlying object. Assigning a new value or object reference to the parameter within the method won't persist beyond the scope of the method because it only affects the parameter, which is a copy of the outer value-type or reference-type variable.

To use examples from about_Ref:

In the following example, the function changes the value of the variable passed to it. In PowerShell, integers are value types so they are passed by value. Therefore, the value of $var is unchanged outside the scope of the function.

(Note - this should probably say "the function changes the value of the parameter passed to it", since it doesn't actually change the value of the outer variable)

Function Test($data)
{
    $data = 3
}

$var = 10
Test -data $var
$var
# 3 # no change

When $var is a value-type, assigning a new value to $data inside the function makes no difference to the value of $var outside the function because the function is passed a "copy" of the value-type variable and any changes only affect the copy.

A similar example for reference types that shows the $var variable is also unchanged after the function finishes:

Function Test($data)
{
    $data = @{ "Test" = "New Text" }
}

$var = @{ "xxx" = "yyy"}
Test -data $var
$var
# @{ "xxx" = "yyy" } # no change

Again, $data is a duplicate reference to the same reference-type object as in the $var variable - they're independent references even though they point to the same object, so assigning a new value to $data doesn't affect $var.

Note however, that about_Ref gives this example as well:

In the following example, a variable containing a Hashtable is passed to a function. Hashtable is an object type so by default it is passed to the function by reference.

When passing a variable by reference, the function can change the data and that change persists after the function executes.

This is a bit misleading though because:

  • A hashtable is not an "object type", it's a "reference type"
  • The parameter isn't being "passed to the function by reference" - it's actually a reference-type variable being passed into the function by value!
Function Test($data)
{
    $data.Test = "New Text"
}

$var = @{}
Test -data $var
$var
# @{ "Test" = "New Test" }

To clarify, this example isn't changing the value of the $data parameter (i.e. which object it points to) - it's simply calling an instance method on the reference-type object the parameter points to, and that method mutates the object. Since the $var variable in the outer scope is pointing to the same reference-type object, changes to $data in the function are "visible" to $var outside the function because they're both referring to the same object in memory.

It's basically equivalent to this after PowerShell has resolved the appropriate type accelerators:

Function Test($data)
{
    $data.Add("Test", "New Text")
}

$var = @{}
Test -data $var
$var
# @{ "Test" = "New Text" }

And that's the important difference between value types and reference types - we can mutate reference types inside a function and the changes will be visible in the outer variable when the function exists because even though the parameter and the variable are independent references, they refer to the same object, so mutations made via the function parameter are visible from the outer variable.

Summary: By default, assigning a value to a parameter inside a function doesn't affect what is assigned to the variable in the parent scope, regardless of whether the parent variable is a value type or reference type. Reference type objects can be mutated within the function by calling methods on the object referred to by the parameter, and the effect is visible in the variable outside the function because they're pointing to the same reference-type object.

Passing by value vs passing by reference

When you pass a variable by value you get the behaviour described above in "Value Types vs Reference Types" - that is, assignments to a parameter inside a function do not persist back into the variable in the parent scope.

By contrast, when you pass a variable by reference you can assign a new value-type or reference-type value to the variable in the parent scope:

Passing by reference enables function members, methods, properties, indexers, operators, and constructors to change the value of the parameters and have that change persist in the calling environment.

And about_Ref says:

You can code your functions to take a parameter as a reference, regardless of the type of data passed.

Although it should probably say "You can code your functions to pass a parameter by reference" to avoid confusion with "reference-type parameters".

The example it gives is a value-type parameter passed by reference:

Function Test([ref]$data)
{
    $data.Value = 3
}

$var = 10
Test -data ([ref]$var)
$var
# 3

Note that here, changes to $data.Value inside Test persist into $var after the function call because the variable is passed by reference.

Similarly for reference types passed by reference:

Function Test([ref]$data)
{
    $data.Value = @{ "Test" = "New Text" }
}

$var = @{ "xxx" = "yyy" }
Test -data ([ref]$var)
$var
# @{ "Test" = "New Text" } # new value

In this case, we've not mutated the original hashtable - we've simply assigned a whole different hashtable to $var. We can prove this with a couple more lines of code:

$new = @{ "xxx" = "yyy" }
$old = $new
Test -data ([ref]$new)

$new
# @{ "Test" = "New Text" } # new value

$old
# @{ "xxx" = "yyy" } # old value is still there unchanged

$new and $old both pointed to the same hashtable before the function call, but afterwards $new points to a completely different object.

Summary: When passing parameters by reference, assigning a new value to a parameter inside a function also assigns the new value to the variable in the parent scope, regardless of whether the parent variable is a value type or reference type.

Grand Summary

Bringing it right back to your original question, you can hopefully see now that your two examples aren't really equivalent:

class One_i {
    static [Void] Act ([System.Collections.Generic.List[string]] $data) {
        $data.Add("One_i::Act(): $(Get-Date)")
        [Two_i]::Act($data)
    }
}

class One_e {
    static [Void] Act ([Ref] $data) {
        $data.value.Add("One_e::Act(): $(Get-Date)")
        [Two_e]::Act($data.value)
    }
}

There's no "implicit" and "explicit" [ref] as such - both approaches simply mutate the contents of the $data reference-type parameter in-place but (if it wanted to) One_e could potentially change the object referred to by $data_e which would be persisted into $var after the function call ends. To prove this, try adding this at the end of your two methods $data.Value = @{ "xxx" = "yyy" } and see which version persists the change back up to $data_i or $data_e (you can probably guess One_e does :-).

mclayton
  • 8,025
  • 2
  • 21
  • 26
  • @mklement0 - apologies for the @, but if anyone can help correct or tighten up my answer I know it'd be you :-). If it's unsalvageable feel free to say so as well and I'll just delete it! – mclayton Jul 26 '21 at 14:38
  • It seems like part of my problem is one of vocabulary. In my mind "change the value" means change the data contained in the variable, and I should really be thinking "mutate" in that case. And while both approaches are functionally identical from a mutation standpoint, `[Ref]` opens up the possibility of redefining the variable, which would be a bug, so what I call "implicit" might be safer, since I ONLY want to change the data contained in the variable. Is that getting closer to an accurate understanding (as well as a reasoned decision)? Or am I still misunderstanding value? – Gordon Jul 27 '21 at 07:36
  • So, as seen in my edit of the original question, I DON'T seem to understand the nuance. And I also want to add here, something I would love to see answered, if anyone has insight into what the PowerShell Dev Team was thinking, is the reason why Arrays and Hashtables default to By Ref while simple Strings and Integers don't. That feels odd to me, but I am sure there is a really valid reason that simply escapes me at the moment. – Gordon Jul 27 '21 at 08:19
  • 1
    @mclayton one aspect missing from your answer is that the runtime (.NET) distinguishes between value types (structs and integral types, including enum values) and reference types ("complex types"), and this has a fundamental influence on how we pass arguments in PowerShell – Mathias R. Jessen Jul 27 '21 at 13:57
  • Indeed, it is this complex/reference type behavior that I am most confused by. Does it suggest that there is no good reason to use `[Ref]` when passing reference types? The result seems to be the same, but as I am beginning to understand it the underlying mechanism is different, so I wonder what the implications, if any, are for choosing one over the other when there is the option? – Gordon Jul 28 '21 at 10:46
  • 1
    @Gordon - I'll try to update this answer later, but you might want to have a read of this: http://www.leerichardson.com/2007/01/parameter-passing-in-c.html. It *might* help understand the two independent concepts of "Value Types vs Reference Types" and "Passing by Value vs Passing by Reference". The ```[Ref]``` feature is for when you want to "Pass by Reference" - it's basically like the ```out``` keyword in C#. – mclayton Jul 28 '21 at 13:32
  • That these are INDEPENDENT ideas, with identical names, is exactly what I needed. I will have to read the original post mentioned at that link, because I am still struggling with exactly what the difference would be between a reference type passed by value and a reference type passed by reference. I THINK what's happening in my "implied" approach is the argument is an independent variable, that happens to point to the same memory location, and my "explicit" example, using [Ref] actually constructs a reference back to the variable being passed, and THAT is what points at the memory location. – Gordon Jul 28 '21 at 15:59
  • @MathiasR.Jessen - I've reworked it to make it clearer there are two concepts at play, and added links to more pages. It's getting quite long now, but hopefully it's an improvement :-) – mclayton Jul 28 '21 at 21:52