1

Good day.

I wrote a script that imports Excel files and then compares the rows. Each file contains about 13K rows. It is taking about 3 hours to process, which seems too long. This is happening because I am looping through every 13K rows from fileb for each row in filea.

Is there a more efficient way to do this?

Here is sample code:

#Import rows as customObject
rowsa = Import-Excel $filea
rowsb = Import-Excel $fileb

#Loop through each filea rows
foreach ($rowa in $rowsa)
{
    #Loop through each fileb row. If the upc code matches rowa, check if other fields match
    foreach ($rowb in $rowsb)
        {
                    $rowb | Where-Object -Property "UPC Code" -Like $rowa.upc |  
                Foreach-Object {
                    if (( $rowa.uom2 -eq 'INP') -and ( $rowb.'Split Quantity' -ne $rowa.qty1in2 )) 
                    {
                      #Do Something
                    }
        }
}
MMariani
  • 21
  • 2
  • Generally: use a hash table or linq. Using this [`Join-Object script`](https://www.powershellgallery.com/packages/Join)/[`Join-Object Module`](https://www.powershellgallery.com/packages/JoinModule) (see also: [In Powershell, what's the best way to join two tables into one?](https://stackoverflow.com/a/45483110/1701026)): `Import-Excel $filea |Join (Import-Excel $fileb) -on upc -eq "UPC Code"` ... – iRon Sep 18 '22 at 08:15

2 Answers2

1

Seems like you can leverage Group-Object -AsHashtable for this. See about Hash Tables for more info on why this should be faster.

$mapB = Import-Excel $fileb | Group-Object 'UPC Code' -AsHashTable -AsString
foreach($row in Import-Excel $filea) {
    if($mapB.ContainsKey($row.upc)) {
        $value = $mapB[$row.upc]
        if($row.uom2 -eq 'INP' -and $row.qty1in2 -ne $value.'Split Quantity') {
            $value # => has the row matching on UPC (FileA) / UPC Code (FileB)
            $row   # => current row in FileA
        }
    }
}
Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
0

A few tricks:

  1. The Object Pipeline may be easy, but it's not as fast as a statement

Try changing your code use to foreach statements, not Where-Object and Foreach-Object.

  1. Use Hashtables to group.

While you can use Group-Object to do this, Group-Object suffers from the same performance problems as anything else in the pipeline.

  1. Try to limit looping within looping.

As a general rule, looping within looping will be o(n^2). If you can avoid loops within loops, this is great. So switching the code around to loop thru A, then loop thru B, will be more efficient. So will exiting your loops as quickly as possible.

  1. Consider using a benchmarking tool

There's a little module I make called Benchpress that can help you test multiple approaches to see which is faster. The Benchpress docs have a number of general PowerShell performance benchmarks to help you determine the fastest way to script a given thing.

Updated Script Below:

#Import rows as customObject
$rowsa = Import-Excel $filea
$rowsb = Import-Excel $fileb
$rowAByUPC = @{}
foreach ($rowA in $rowsa) {
   # This assumes there will only be one row per UPC.
   # If there is more than one, you may need to make a list here instead
   $rowAByUPC[$rowA.UPC] = $rowA 
}

foreach ($rowB in $rowsB) {
   # Skip any rows in B that don't have a UPC code.
   $rowBUPC = $rowB."UPC Code"
   if (-not $rowBUPC) { continue }
   $RowA = $rowAByUPC[$rowBUPC]
   # It seems only rows that have 'INP' in uom2 are important
   # so continue if missing
   if ($rowA.uom2 -ne 'INP') { continue }
   if ($rowA.qty1in2 -ne $rowB.'Split Quantity') {
      # Do what you came here to do.
   }
}

Please note that as you have not shared the code within the matching condition, you may need to take the advice contained in this answer and apply it to the inner code.

Start-Automating
  • 8,067
  • 2
  • 28
  • 47
  • I tried the hash table, but does not work. Please see [screenshot](https://www.screencast.com/users/michael.mariani/folders/Snagit/media/ac2abbc1-76fb-4ce7-a019-44e127a28d68/). Any ideas why? Thanks. – MMariani Sep 24 '22 at 19:55