1

I am novice in powershell and using it very rarely for some little things. I am using this one liner in order to extract emails recursive

(Get-ChildItem -Include *.txt -Recurse | Get-Content | Select-String -Pattern "(?:[a-zA-Z0-9_\-\.]+)@(?:[a-zA-Z0-9_\-\.]+)\.(?:[a-zA-Z]{2,5})").Matches | Select-Object -ExpandProperty Value -Unique

In order to access Matches property I've added parentheses. Later I come to that way:

Get-ChildItem -Include *.txt -Recurse | Get-Content | Select-String -Pattern "(?:[a-zA-Z0-9_\-\.]+)@(?:[a-zA-Z0-9_\-\.]+)\.(?:[a-zA-Z]{2,5})" | Select-Object -ExpandProperty Matches -Unique | Select-Object -ExpandProperty Value

I want to to ask what parentheses do exactly in the first version.

Georgi Naumov
  • 4,160
  • 4
  • 40
  • 62
  • 2
    if you are dealing with MANY objects, then the process of building/sending/un-building things to shove across the pipeline can be slow. however, the definition of "many" is likely in the thousands ... and by that time, you may need the pipeline for the RAM savings. so ... use the one that makes the most sense to you. ///// one thing that i would change is the use of `Get-Content` - the `Select-String` cmdlet will load a file directly, so you can save a step AND a pipeline stage. [*grin*] – Lee_Dailey Apr 20 '20 at 12:35
  • 1
    Please don't under estimated the speed of the PowerShell pipeline. The point here is that you unfortunately can't really leverage from it as for the `-Unique` switch you will need to stall the pipeline. (Meaning, **with** the `-Unique` parameter, the **first** command is generally faster, **without** `-Unique` parameter, the **later** command is generally faster) – iRon Apr 20 '20 at 15:29
  • 1
    If you mean "better" in terms of performance, I would recommend you to look at your whole solution and not just to a single statement as **the performance of a complete (PowerShell) solution is supposed to be better than the sum of its parts**, see also: [Fastest Way to get a uniquely index item from the property of an array](https://stackoverflow.com/a/59437162/1701026) – iRon Apr 20 '20 at 15:40
  • 1
    See also: [Select-Object -Unique is unnecessary slow and exhaustive #11221](https://github.com/PowerShell/PowerShell/issues/11221) – iRon Apr 21 '20 at 18:29

1 Answers1

1

Say you have an $output via some function (gci in your case) and you are interested in the field $output.Matches.

  • If you run $output | select Matches (example 1), you run a Foreach-Object statement against every object in your array. This pipeline will use some RAM (very limited, indeed) that are used in a serial calculation, so every object of $output is processed one after the other.

  • If you run $output.Matches (example 2), you select a field from an array. This will use a lot of RAM at once, but the field will be processed as one big object instead of many little objects.

As it comes to performance. As always, note that PowerShell is not the way to go if you need high performance. It was never designed to be a fast programming language.

When you're using small objects (like gci $env:userprofile\Desktop), the performance hit will be small. When using large objects or using a lot of nested pipes, the performance hit will be large.

I've just tested it with a gci Z:\ -recurse when Z:\ is a network drive. Performance is dropped with a factor of 20 in this specific case. (Use Measure-Command to test this.)

IT M
  • 359
  • 1
  • 2
  • 12