5

I am trying to gather data from several servers using ForEach-Object -Parallel. The variable I use is being populated within the loop, but when the loop finishes the variable is empty.

$DBDetails = "SELECT @@VERSION"

$VMs = ("vm1", "vm2", "vm3", "vm4", "vm5", "vm6", "vm7")
$DBInventory = @()

$scriptBlock = {
    $vm = $_
    $result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
    $DBInventory += $result
    Write-Host "Added $($result.Count) rows from $($vm)"
}

$VMs | ForEach-Object -Parallel $scriptBlock
Write-Host "Number of elements in DBInventory: $($DBInventory.Count)"

I expect the last line to return the number of elements gathered within the loop that executed on the previous line. There should be a total of 7 elements, but I am left with none.

My result looks like this:

Added 1 rows from vm1
Added 1 rows from vm2
Added 1 rows from vm3
Added 1 rows from vm4
Added 1 rows from vm5
Added 1 rows from vm6
Added 1 rows from vm7
Number of elements in DBInventory: 0

2 Answers2

3

ForEach-Object -Parallel causes execution of the loop body in a separate runspace, meaning you don't have direct access to the variables defined in the calling scope.

To work around this, make two changes to your code:

  • Use a collection type other than a resizable array (below I've use a generic [List[psobject]])
  • Reference the variable from the caller's scope with the using: scope modifier and assign to a local inside the block

The resulting local variable will then reference the same list-object in memory, and changes made to that list via its methods (Add(), Remove(), AddRange(), etc.) will be reflected anywhere else its referenced (including the original $DBInventory variable from your calling scope).

$DBDetails = "SELECT @@VERSION"

$VMs = ("vm1", "vm2", "vm3", "vm4", "vm5", "vm6", "vm7")
$DBInventory = [System.Collections.Generic.List[psobject]]::new()

$scriptBlock = {
    $vm = $_
    $inventory = $using:DBInventory
    
    $result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
    $inventory.AddRange([psobject[]]$result)
    Write-Host "Added $($result.Count) rows from $($vm)"
}

$VMs | ForEach-Object -Parallel $scriptBlock
Write-Host "Number of elements in DBInventory: $($DBInventory.Count)"

As mklement0 notes, [List[psobject]] is not thread-safe - for production code you'll definitely want to pick a collection type that is, like for example a [System.Collections.Concurrent.ConcurrenBag[psobject]] - essentially an unordered list:

$DBInventory = [System.Collections.Concurrent.ConcurrentBag[psobject]]::new()

Beware that the ConcurrentBag type, as the name might suggest, does not preserve insertion order. If this is a problem, you may want to consider using a [ConcurrentDictionary[string,psobject[]]] - this way you can tie the query output back to the orignal input string:

$DBInventory = [System.Collections.Concurrent.ConcurrentDictionary[string,psobject[]]]::new()

Since another thread may (hypothetically) have added an entry for the same key since you dispatched your call to Add(), the ConcurrentDictionary type requires us to use it slightly differently than a regular dictionary or hashtable:

$scriptBlock = {
    $vm = $_
    $inventory = $using:DBInventory
    
    $result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
    $adder = $updater = { return Write-Output $result -NoEnumerate }
    $inventory.AddOrUpdate($vm, $adder, $updater)
    Write-Host "Added $($result.Count) rows from $($vm)"
}

Here, the concurrent dictionary will execute the $adder function on our behalf if the key doesn't already exist (otherwise it'll run the $updater), and the result will be assigned as the entry value.

You can subsequently access the entry values the same way you would a hashtable:

$DBInventory[$vms[-1]] # returns array containing the query results from the last VM in the list
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
2

tl;dr

  • Use the $using: scope to refer to the value of variables defined in the caller's scope, as you're partially already doing.

  • You cannot directly modify variables in the caller's scope ($using:DBInventory += $result would not work), but you don't need to: let PowerShell collect the output objects in an array for you:

$DBDetails = "SELECT @@VERSION"

$VMs = ("vm1", "vm2", "vm3", "vm4", "vm5", "vm6", "vm7")
 = @()

$scriptBlock = {
    $vm = $_
    $result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
    Write-Host "Outputting $($result.Count) rows from $($vm)"
    $result # Simply output the objects 
}

# Let PowerShell collect all output objects from the ForEach-Object -Parallel call 
# in an array.
# Note: The [array] type constraint ensures that $DBInventory is an array
#       even if there happens to be only *one* output object.
[array] $DBInventory = $VMs | ForEach-Object -Parallel $scriptBlock

Write-Host "Number of elements in DBInventory: $($DBInventory.Count)"

$DBInventory will contain a regular PowerShell array ([object[]]).


Background information:

  • Your code already partially shows awareness that you need the $using: scope inside a script block that runs in a different runspace (such as those in the threads that ForEach-Object -Parallel creates) in order to refer to variable values from the caller's scope.

    • This therefore in principle applies to your caller-side $DBInventory variable as well, however:
      • A $using: reference is a reference to a variable value, not to a variable itself, so you cannot assign to $using: references.
      • That is, $using:DBInventory += $result would not work, leaving aside the general point that using += to "grow" arrays is best avoided due to its inefficiency - see this answer.
  • While you could initialize $DBInventory to an efficiently extensible list type, you'd have to ensure that it is grown in thread-safe manner, given that you're using thread-based parallelism via ForEach-Object -Parallel:

    • Notably, the commonly used list types [System.Collections.Generic.List[object]] and System.Collections.ArrayList are not thread-safe.

    • You'd either have to:

      • add manual synchronization code to your script block, using .NET APIs, which is nontrivial.
      • pick a different, concurrent (thread-safe) list type (there is none built in for generic lists)
      • use a thread-safe wrapper, e.g. $DBInventory = [System.Collections.ArrayList]::Synchronized([System.Collections.Generic.List[object]] @()), which returns a non-generic [System.Collections.IList] implementation.
        Note, however, that this be inefficient with generic lists with value-type elements, doesn't expose an .AddRange() method for efficient appending of multiple elements, and its .Add() method returns a (usually unwanted) value, which you'll have to discard with $null = ($using:DBInventory).Add(...)
    • Note that the reason that growing lists via $using: does work - as opposed to via += - is that you're adding elements via methods (.Add(), .AddRange()) of the object that is the value of the variable being referenced with $using:. That is, you're directly modifying the variable value, not the variable itself (which isn't supported).

  • Fortunately, there's a simpler solution: Rely on PowerShell's ability to automatically collect all output objects emitted by a pipeline in an array, which is both more concise and more efficient than manual growth of a list, and also works with ForEach-Object -Parallel, as shown at the top - again, see this answer for background information.

mklement0
  • 382,024
  • 64
  • 607
  • 775