0

I have two large .csv files that I've imported using the import-csv cmdlet. I've done a lot of searching and trying and am finally posting to ask for some help to make this easier.

I need to move through the first array that will have anywhere from 80k rows to 500k rows. Each object in these arrays has multiple properties, and I then need to find the corresponding entry in a second array of the same size matching on a property from there.

I'm importing them as [systems.collection.arrayList] and I've tried to place them as hashtables too. I have even tried to muck with LINQ which was mentioned in several other posts.

Any chance anyone can offer advice or insight how to make this run faster? It feels like I'm looking in one haystack for matching hay in a different stack.

$ImportTime1 = Measure-Command {
    [System.Collections.ArrayList]$fileList1 = Import-csv file1.csv
    [System.Collections.ArrayList]$fileSorted1 = ($fileList1 | Sort-Object -property 'Property1' -Unique -Descending)
    Remove-Variable fileList1
}

$ImportTime2 = Measure-Command {
    [System.Collections.ArrayList]$fileList2 = Import-csv file2.csv
    [System.Collections.ArrayList]$fileSorted2 = ($fileList2 | Sort-Object -property 'Property1' -Unique -Descending)
    Remove-Variable fileList2
}

$fileSorted1.foreach({
     $varible1 = $_
     $target = $fileSorted2.where({$_ -eq $variable1})
     ###do some other stuff
})
bribri
  • 1
  • 1
  • Does this answer your question? [Which operator provides quicker output -match -contains or Where-Object for large CSV files](https://stackoverflow.com/a/58474740/1701026) and [Fastest Way to get a uniquely index item from the property of an array](https://stackoverflow.com/a/59437162/1701026) and the links mentioned in: [In Powershell, what's the best way to join two tables into one?](https://stackoverflow.com/a/45483110/1701026). If you looking for a ready-made LINQ solution, try: `Join-Object` from [ili](https://github.com/ili101/Join-Object). – iRon Jan 25 '20 at 11:09
  • Generally, performance results might depend on your input - and even on your (expected) output object (see: [Timing a command's execution in PowerShell](https://stackoverflow.com/a/59908820/1701026)) meaning that besides a more specific example of what you tried, it is good to add some sample input data and the expected output (see also: [How to Ask](https://stackoverflow.com/help/how-to-ask)). – iRon Jan 27 '20 at 10:15

1 Answers1

0

This may be of use: https://powershell.org/forums/topic/comparing-two-multi-dimensional-arrays/

The updated solution in comment #27359 + add the suggested change by Max Kozlov in comment #27380.

Function RJ-CombinedCompare() {
    [CmdletBinding()]
    PARAM(
        [Parameter(Mandatory=$True)]$List1,
        [Parameter(Mandatory=$True)]$L1Match,
        [Parameter(Mandatory=$True)]$List2,
        [Parameter(Mandatory=$True)]$L2Match
    )
    $hash = @{}
    foreach ($data in $List1) {$hash[$data.$L1Match] += ,[pscustomobject]@{Owner=1;Value=$($data)}}
    foreach ($data in $List2) {$hash[$data.$L2Match] += ,[pscustomobject]@{Owner=2;Value=$($data)}}
    foreach ($kv in $hash.GetEnumerator()) {
        $m1, $m2 = $kv.Value.where({$_.Owner -eq 1}, 'Split')
        [PSCustomObject]@{
            MatchValue = $kv.Key
            L1Matches = $m1.Count
            L2Matches = $m2.Count
            L1MatchObject = $L1Match
            L2MatchObject = $L2Match
            List1 = $m1.Value
            List2 = $m2.Value
        }
    }
}

$fileList1 = Import-csv file1.csv
$fileList2 = Import-csv file2.csv

$newList = RJ-CombinedCompare -List1 $fileList1 -L1Match $(yourcolumnhere) -List2 $fileList2 -L2Match $(yourothercolumnhere)

foreach ($item in $newList) {
    # your logic here
}

It should be fast to pass the lists into this hashtable and it's fast to iterate through as well.