1

I am using a csv file to load a table-like object and then search this for existing items. With only 2000 items, a computer running on a I5 CPU takes 4 seconds perform search using where on 2 columns. Wondering what I am doing wrong:

$uploadedRecordings = import-csv -path $ArchiveUploadedFilesInGoogleDrive
...[some other code]
if($uploadedRecordings | where { $_.Name -eq $filename -and $_.Size -eq $item.file_size}){
               Write-Host "[Already downloaded] Skipping....
}

Where $item (sample):

id              : f5b693
meeting_id      : uT4dfhghd==
recording_start : 2020-03-25T16:01:31Z
recording_end   : 2020-03-25T18:14:36Z
file_type       : M4A
file_size       : 54332420
play_url        : https://myurl
download_url    : https://otherurl
status          : completed
recording_type  : audio_only

and $filename = "Meeting - 2020-04-20 -- 09.29.59.mp4"

Riccardo
  • 2,054
  • 6
  • 33
  • 51

1 Answers1

1

PowerShell offers great features, far beyond what traditional shells offer, but one thing it is not: a speed demon.

PowerShell's object-oriented pipeline is a wonderful tool, but it can be slow.

This answer summarizes performance recommendations; in the case at hand, you can speed up your command by avoiding the pipeline in favor of the .Where() array method:

if ($uploadedRecordings.Where({ 
  $_.Size -eq $item.file_size -and $_.Name -eq $filename 
})) {
  Write-Host "[Already downloaded] Skipping..."
}

Also note how I've swapped the -and operands in favor of comparing file sizes first, to take advantage of short-circuiting; after all, files being exactly identical in size is less common than their having the same name.

You may be able to speed things up further a bit by caching $item.file_size in an auxiliary variable, though my hunch is that that won't make much of a difference in practice.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Thanks a lot. There's a slight gain with your solution (in your sample it misses the second closed parenthesis as for the `if` statement). Does it makes any sense to bother changing the CSV to a hastable? Searches seem to be much faster: https://www.get-blog.com/?p=122 but how should I change the code, as the CSV has 3 columns filename, date, size – Riccardo May 10 '20 at 22:09
  • @Riccardo `Import-Csv` doesn't directly allow you to import the file as a hashtable. How, exactly, are you processing the rows? One way to speed things up would be to use plain-text processing via `switch` statement, but that is more cumbersome. – mklement0 May 10 '20 at 22:16
  • 1
    Further investigation shows that your suggested code runs much better than it was! Thanks – Riccardo May 11 '20 at 11:36