Powershell supports arguments by reference two different ways. Any variable can be passed by reference using [Ref]
as the type, or basically I guess casting to the special [Ref]
type. Alternatively, complex data types like [Hashtable]
are always passed by reference. The former approach then requires using .value
when either changing the value or passing the variable to another class method or function. In simplified form this is what that looks like in practice.
# implicit By Reference
class One_i {
static [Void] Act ([System.Collections.Generic.List[string]] $data) {
$data.Add("One_i::Act(): $(Get-Date)")
[Two_i]::Act($data)
}
}
class Two_i {
static [Void] Act ([System.Collections.Generic.List[string]] $data) {
$data.Add("Two_i::Act(): $(Get-Date)")
}
}
# explicit By Reference
class One_e {
static [Void] Act ([Ref] $data) {
$data.value.Add("One_e::Act(): $(Get-Date)")
[Two_e]::Act($data.value)
}
}
class Two_e {
static [Void] Act ([Ref] $data) {
$data.value.Add("Two_e::Act(): $(Get-Date)")
}
}
CLS
$data_i = [System.Collections.Generic.List[string]]::new()
$data_i.Add("Init_i: $(Get-Date)")
[One_i]::Act($data_i)
$data_i.Add("Finalize_i: $(Get-Date)")
foreach ($item in $data_i) {
Write-Host "$item"
}
Write-Host
$data_e = [System.Collections.Generic.List[string]]::new()
$data_e.Add("Init_e: $(Get-Date)")
[One_e]::Act($data_e)
$data_e.Add("Finalize_e: $(Get-Date)")
foreach ($item in $data_e) {
Write-Host "$item"
}
Write-Host
I am conflicted as to which approach I prefer. On the one hand [Ref]
makes it clear which variables are By Ref, which is useful. However, the need for .value
makes the code a bit harder to read, adds work, and creates an opportunity to forget and induce a bug. I tested performance like this...
(measure-Command {
$data_i = [System.Collections.Generic.List[string]]::new()
$data_i.Add("Init_i: $(Get-Date)")
foreach ($i in 1..1000) {
[One_i]::Act($data_i)
}
$data_i.Add("Finalize_i: $(Get-Date)")
}).TotalSeconds
(measure-Command {
$data_e = [System.Collections.Generic.List[string]]::new()
$data_e.Add("Init_e: $(Get-Date)")
foreach ($e in 1..1000) {
[One_e]::Act($data_e)
}
$data_e.Add("Finalize_e: $(Get-Date)")
}).TotalSeconds
and implicit seems to be VERY slightly faster, but not enough to make an argument for implicit based on performance alone. This leads me to a number of questions as I try to decide which approach to adopt, or perhaps the realization that each approach is the better choice in certain situations and using both in a single program actually has merit.
1: Why are complex types handled differently than simple types? The inconsistent behavior seems odd to me, and I suspect I will learn something fundament if I understand why the behavior is different.
2: Beyond needing to use .value
or not, are there other differences between implicit and explicit By Reference? Especially are there any potential problems with one approach or the other?
3: Is there a mechanism to force a complex type to be By Value? And what would be the use case for that?
4: I found this thread, with a comment saying it's best to avoid By Reference (but without explanation). However, I have also found references to Dependency Injection being superior to Global variables and/or the Singleton pattern, and By Reference is fundamental to DI. Is there annoying important I need to be aware of here, ESPECIALLY as it relates to PowerShell/Classes/Dependency Injection, given that PowerShell classes are not as fully implemented as with other languages. For what it's worth I am working in PS 5.1 and will not be moving to Core any time soon, but it would be great to know if there are fundamental differences between Core and pre Core implementations.
EDIT: Based on @mclayton's answer and my understanding of "Changing value" vs "mutation" I tried "breaking" the reference like this
# implicit By Reference
class One_i {
static [Void] Act ([System.Collections.Generic.List[string]] $data) {
$data.Add("One_i::Act(): $(Get-Date)")
[Two_i]::Act($data)
$data = [System.Collections.Generic.List[string]]::new()
}
}
# explicit By Reference
class One_e {
static [Void] Act ([Ref] $data) {
$data.value.Add("One_e::Act(): $(Get-Date)")
[Two_e]::Act($data.value)
$data.value = [System.Collections.Generic.List[string]]::new()
}
}
I tried both $data.value = [System.Collections.Generic.List[string]]::new()
and $data = [System.Collections.Generic.List[string]]::new()
in the "explicit" example, and none of theme actually seemed to cause an issue. The console output was "correct" in all three cases. Which suggests that I DON'T understand the difference between changing value and mutation after all.
That said, as it relates to my initial reason for asking the question, namely deciding which of the two approaches I want to use, I am leaning towards what I call "implicit", since it does make the code a bit simpler by eliminating the need for .value
, and not having the by reference nature of the variable be obvious doesn't change the fact that I have to understand that it is by reference, no matter what I do. I really wish I could find a thread that discusses the relative merits of the two approaches, but thus far my searches have failed to turn up anything useful.
Edit2: So, I found this where mklement0 says there really is no functional difference in the two approaches. And it occurred to me that I could get the best of both worlds by using [Ref]
and then assigning $ref.value
to a local variable, to simplify things within the method or function. Tentatively feeling like that is the best answer. It add one line at the top of each function or method, but makes the calls more obvious in behavior as well as simplifying use in the body of the method or function.
# explicit By Reference with sugar
class One_x {
static [Void] Act ([Ref] $data) {
$localData = $data.Value
$localData.Add("One_x::Act(): $(Get-Date)")
[Two_e]::Act(([Ref]$localData))
}
}
class Two_x {
static [Void] Act ([Ref] $data) {
$localData = $data.Value
$localData.Add("Two_x::Act(): $(Get-Date)")
}
}