2

While doing some testing with Add-Member, I created 'copies' of my actual data to test with, but I quickly found that Add-Member was actually adding new properties to the original object without being told or asked to. Is this the expected behavior? If yes, how are by reference variables working in PowerShell?

Example:

# Create a CSV file
@'
Name,Title
Bob,President
Todd,Secretary
'@ > test.csv

# Load the CSV into an object
$Data = Import-Csv test.csv

# Create a duplicate of $Data (this must be the issue)
$NewData = $Data

# Add a new property to the $NewData object
$NewData | Add-Member ScriptProperty "First Initial" {$this.Name[0]}

# Check the original Object ($Data) and see the madness
$Data

# Viewing $Data as a table
Name Title     First Initial
---- -----     -------------
Bob  President             B
Todd Secretary             T

$Data | Get-Member

# Just confirming that there is indeed a MemberType of ScriptProperty that was added to $Data

   TypeName: System.Management.Automation.PSCustomObject

Name          MemberType     Definition
----          ----------     ----------
Equals        Method         bool Equals(System.Object obj)
GetHashCode   Method         int GetHashCode()
GetType       Method         type GetType()
ToString      Method         string ToString()
Name          NoteProperty   string Name=Bob
Title         NoteProperty   string Title=President
First Initial ScriptProperty System.Object First Initial {get=$this.Name[0];}
immobile2
  • 489
  • 2
  • 15
  • 3
    `PSObject`s are all reference types. Doing deep copies is actually [quite involved](https://stackoverflow.com/q/9581568/4137916) and to be avoided if there's an alternative (in this case, if your CSV file is not too large, reading it again would do it). – Jeroen Mostert Nov 03 '21 at 13:59
  • 3
    `$NewData = $Data` does _not_ create a duplicate - both variables refer to the exact same array – Mathias R. Jessen Nov 03 '21 at 14:02

1 Answers1

1

To flesh out the helpful comments on the question:

$NewData = $Data does not copy data, because $Data contains a reference to an array, which is an instance of a .NET reference type. Instead, it is the reference that is copied, so that both variables end up referencing the very same array.

An easy (albeit inefficient) way to create a copy of an array in PowerShell is to enclose it in @(...), the array-subexpression operator:

# Convenient, but slow with large arrays.
$NewData = @($Data)

# Faster.
$NewData = $Data.Clone()

However, if the array's elements too contain references to .NET reference-type instances, the copied array's elements still reference the very same instances - and the [pscustomobject] instances that Import-Csv outputs are indeed reference-type instances.

You can create a copy of the $Data array and its elements with the following, using the intrinsic .psobject member to create (shallow) copies of [pscustomobject] instances:

$NewData = $Data | ForEach-Object { $_.psobject.Copy() }

However, as noted, .psobject.Copy() creates a shallow copy of each [pscustomobject] input object. That is:

  • properties that happen to contain instances of .NET value types result in true copies in the copied object; similarly, string values are in effect copies.[1]

  • by contrast, properties containing references to instances of .NET reference types result in those references being copied, which means that both the original object's property value as well as the copied object's point to the very same instance, which means that if the referenced instance is mutable, changes to it are reflected in both containing objects.

That said, with objects created via Import-Csv, whose properties by definition are all strings, that is not a concern.

See this answer for more information.


[1] [string] is technically a reference type, but is usually treated like a value type in .NET. Since a string instance is by definition immutable, you can rely on any copy of it to remain unchanged. A string can never be modified, it can only be replaced by a )new_ string.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Stumbled upon a _slightly_ [related question/answer](https://stackoverflow.com/q/13416651/15243610), but don't see a lot of answers. Based on my newfound understanding of reference types - is it safe to say that [this author's use of `$Script:Result` is unnecessary](https://thesurlyadmin.com/2013/02/21/objects-and-hashtables/amp/)? Or are there memory concerns when it comes to passing mutable objects back and forth between functions? At the very end, circa 2013, he notes a concern of passing the hash table itself to functions that update it. That concern doesn't seem valid to me right now – immobile2 Nov 17 '21 at 07:05
  • @immobile2, indeed the use of `$script:` is unnecessary in this case - though you could argue it's better to signal the fact that you're operating on a variable in the script scope explicitly. However, the memory-concern argument suggests a misconception about reference-type instances: It is only the _reference_ to an instance that is passed as an argument, which has a fixed, small size (loosely speaking, a pointer), irrespective of the size of the object it points to. – mklement0 Nov 17 '21 at 15:18