3

What's the fastest way to merge two arrays using a common property?

Users | Select *
Username : Joe.Doe
Office   : Chicago
Email    :

Username : Mike.Smith
Office   : New York
Email    :
...
UserEmails | Select *
AccountEmail  : Mike.Smith
EmailAddress  : mike-smith@example.com

AccountEmail  : Joe.Doe
EmailAddress  : jsmith12@example.com
...

The merge should result in:

UsersCompleteList | Select *
Username : Joe.Doe
Office   : Chicago
Email    : jsmith12@example.com

Username : Mike.Smith
Office   : New York
Email    : mike-smith@example.com
...

Something like for each ($user in $users) { ($user.Email = $userEmails | ? { $_.AccountEmail -eq $user.Username}).EmailAddress takes ages on large datasets.

  • 2
    Loop through one collection and store in a hash. Then loop through the other. Something like: `$hash=@{}; userEmails|%{$hash[$_.AccountEmail]=$_.EmailAddress} ; $users|%{$_.Email = $hash[$_.Username] }` – pinkfloydx33 May 08 '20 at 13:02
  • 1
    Does this answer your question? [In Powershell, what's the best way to join two tables into one?](https://stackoverflow.com/questions/1848821/in-powershell-whats-the-best-way-to-join-two-tables-into-one). Using the [Join-Object](https://www.powershellgallery.com/packages/Join) mentioned in the answer: `$Users | Join-Object $UserEmails -On UserName -Eq AccountEmail` – iRon May 08 '20 at 17:15

1 Answers1

5

Loop through one collection and store the values in a hash. Then loop through the other collection and pull the value back out of the hash. Something like:

$hash = @{}
$userEmails | %{ $hash[$_.AccountEmail] = $_.EmailAddress }
$users | %{ $_.Email = $hash[$_.Username] }

If you have other properties you can just store the original object:

$hash = @{}
$userEmails | %{ $hash[$_.AccountEmail] = $_ }
$users | %{ 
   $item = $hash[$_.Username]
   $_.Email = $item.EmailAddress
   $_.Other = $item.SomethingElse
}

Or with loops instead of ForEach-Object including:

$hash = @{}
foreach($e in $userEmails) {
  $hash[$e.AccountEmail] = $e
}
foreach($u in $users) {
  $item = $hash[$u.UserName]
  if ($item -ne $null) {
    $u.Email = $item.EmailAddress
  }
} 
pinkfloydx33
  • 11,863
  • 3
  • 46
  • 63
  • 3
    Assuming he's not fetching the two arrays with a cmdlet, you might get slightly better performance from using a loop statement like `foreach($email in $userEmails){...}` than by using `ForEach-Object` – Mathias R. Jessen May 08 '20 at 13:43
  • ... Unless you have a slow input and/or output (e.g. a disk and/or very large files) in that case you probably better of using the pipeline from the beginning to the end: `Import-Csv .\Users | Foreach-Object { ... } | Export-Csv .\Output.Csv` (in that case you should also compare/measure both solutions from the beginning to the end). – iRon May 08 '20 at 17:30