1

I have a folder with a bunch of files in the format xxxxxx_MMddyyHHmmss.json. I want to find the unique xxxxxx components because some are duplicates just created at different times.

I referenced this question, but as someone who rarely ventures into Powershell, I can't get it working.

$result = Get-ChildItem  C:\Users\<me>\Desktop\FileData\* -recurse -name -include *.json|%{$_.split("_")[0]|sort-object -unique

But I can't get it to actually execute. It just goes to the next line when hitting return.

Ideally, I could pipe it to a file to review.

Killnine
  • 5,728
  • 8
  • 39
  • 66

1 Answers1

2

There's a syntax error in your command (the % (ForEach-Object) call's script block ({ ... }) is missing a closing }), but it should work in principle.

A streamlined, faster, and more readable version is this:

Get-ChildItem $HOME\Desktop\FileData -Recurse -Filter *.json | 
  ForEach-Object { ($_.Name -split '_')[0] } |  
    Sort-Object -Unique
  • Get-ChildItem $HOME\Desktop\FileData -Recurse -Filter *.json returns file-info objects for all *.json files in the directory subtree of $HOME\Desktop\FileData

  • ForEach-Object { ($_.Name -split '_')[0] } transforms each file-info input object into the first _-separated token of its name.

    • Note that I've switched from using the .NET [string] type's .Split() method ($_.Split("_")[0]) to using PowerShell's -split operator (($_ -split "_")[0]), because -split is generally preferable for its flexibility.

    • That said, when performance matters, .Split() is noticeably faster.

  • Sort-Object -Unique then sorts the resulting tokens and returns only unique ones (omits duplicates).

  • Using -Filter *.json instead of -Include *.json speeds up the file retrieval, because the -Filter parameter is more efficient, due to filtering at the source.


Superior alternative solution:

TheMadTechnician suggests use of Group-Object which allows you to retain information about the individual input files that share a given prefix:

Get-ChildItem $HOME\Desktop\FileData -Recurse -Filter *.json |
  Group-Object { ($_.Name -split '_')[0] } |
    Sort-Object Name

Note: If you don't need the unique prefixes to be sorted, you can omit the Sort-Object call, in which case they will appear in the order in which they're encountered during file traversal.

This results in output objects that contain the unique prefixes in property .Name, as well as all the files that have that prefix in the Group property, which prints something like the following:

Count Name                      Group
----- ----                      -----
    2 abcdef                    {/tmp/abcdef_file1, /tmp/abcdef_file1}
...

To get just the unique prefixes, as in the first command, wrap the entire command in (...).Name.

TheMadTechnician also suggests a slightly faster - though potentially a little more obscure - alternative for extracting the prefix: $_.Name -replace '_.*' uses the -replace operator to remove everything from the first _ in a name.
However, $_.Split("_")[0] is still the fastest solution overall.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • @TheMadTechnician: I looked at performance a little more: while `-replace '_.*'` is faster than `($_.Name -split '_')[0]`, `$_.Split("_")[0]` seems to be the fastest by far. – mklement0 Apr 19 '19 at 15:40