2

Is something like the below example code possible?

    Get-ChildItem $copyFilePath -Filter $_.Basename.Length -ne 22 | forEach-Object{
        
        Copy-Item -path "$($copyFilePath)\$($_.Fullname)"
    }

I am trying to find a way to remove files of such name length from my code without having to loop through the entire file list. By doing so it will reduce my code's run time by an expected 80-85%.

  • 1
    No you can't. `-Filter` only supports what the [underlying API supports](https://learn.microsoft.com/en-us/dotnet/api/system.io.directoryinfo.enumeratefilesysteminfos?view=net-7.0#system-io-directoryinfo-enumeratefilesysteminfos(system-string-system-io-enumerationoptions)). – Santiago Squarzon Jun 14 '23 at 15:53
  • @SantiagoSquarzon will where-object work at a similar pace? or will it still iterate through the whole list of files – Ralph Warner Jun 14 '23 at 15:56
  • 1
    You still need to specify a `-Destination`. Get-ChildItem -File -Path $copyFilePath | Where-Object {$_.Length -ne 22} | ForEach-Object {Copy-Item -Path $_.Fullname -Destination ???} – lit Jun 14 '23 at 15:57
  • 1
    What Mathias posted is "the powershell way to do it", if you need something faster then you need to rely on direct .NET API calls – Santiago Squarzon Jun 14 '23 at 15:57
  • 1
    Actually, a `-Destination` is not required. The default is the current working directory. That seems somewhat risky to me. Better to explicitly specify `-Destination '.'` to be clear. – lit Jun 14 '23 at 16:07
  • @lit the destination is specified, this was more just a theoretical to see if there was a way the filter function would work, didn't really put in to much effort on that line when describing the problem since it was unrelated. – Ralph Warner Jun 14 '23 at 16:12
  • @SantiagoSquarzon is there an already present cmdlet to do what I am describing? or would I have to write my own in C#(Editing the -Filter API)? – Ralph Warner Jun 14 '23 at 16:14
  • 1
    Just using `.EnumerateFiles` would already make a difference when compared with `Get-ChildItem`. If you need to traverse all directories within the path then it becomes trickier if you're using PowerShell 5.1. If you're using PowerShell 7+ then its very easy because you have `EnumerationOptions` available – Santiago Squarzon Jun 14 '23 at 16:18
  • @SantiagoSquarzon yeah with how I have to traverse it will take more time, I want to upgrade to 7 but, because of the industry we can't guarantee our clients will have PS 7 running on their systems and unfortunately can't upgrade(as far as I am aware). – Ralph Warner Jun 14 '23 at 16:19
  • If you have 5.1 and need to traverse al childs then you can use a `Queue` and `.EnumerateFileSystemInfos` – Santiago Squarzon Jun 14 '23 at 16:22
  • @SantiagoSquarzon I just realized to filter for string length couldn't I just check that the basename follows the format of '[ a-z, A-Z, 0-9, _ ]' for 22 chars? So: -Filter '[ a-z, A-Z, 0-9, _ ][ a-z, A-Z, 0-9, _ ][ a-z, A-Z, 0-9, _ ]...x22'? – Ralph Warner Jun 14 '23 at 16:28
  • no, `-Filter` doesn't support character ranges, only `*` and `?`. that might work using `-Include` – Santiago Squarzon Jun 14 '23 at 16:35
  • 1
    problem with that is -Include will still have the issue of actually checking every file instead of targeting files like filter does. Seems like I'll have to break out C# if I want to handle it. Thank you though @SantiagoSquarzon – Ralph Warner Jun 14 '23 at 16:39
  • 1
    Sure np. You don't need C# but you can if you want. PowerShell can use the .NET APIs directly and it's proved to be faster. C# will be even faster for sure tho. – Santiago Squarzon Jun 14 '23 at 16:40
  • @RalphWarner, are you wanting to filter on the length of the filename or the size of the file? If filename length, then with or without extension? – lit Jun 14 '23 at 17:36

3 Answers3

3

No, the -Filter parameter is only for file system names, you'll need the Where-Object cmdlet to filter on anything else:

Get-ChildItem $copyFilePath -File |Where-Object {$_.BaseName.Length -ne 22} |ForEach-Object {
    # ...
}

Beware that the expression "$($copyFilePath)\$($_.Fullname)" inside the ForEach-Object body will result in an invalid file path, as the FullName property contains an already-rooted path.

Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
2

Mathias's answer is the correct PowerShell way to do what you're looking for, however definitely not the fastest. If you want something faster you can rely on .NET API calls instead of relying on PowerShell cmdlets.

$queue = [System.Collections.Generic.Queue[System.IO.DirectoryInfo]]::new()
$copyFilePath = Get-Item 'absolute\path\to\initialDir'
$queue.Enqueue($copyFilePath)

while($queue.Count) {
    $dir = $queue.Dequeue()
    try {
        $enum = $dir.EnumerateFileSystemInfos()
    }
    catch {
        # Use `$_` here for error handling if needed
        # if we can't enumerate this Directory (permissions, etc), go next
        continue
    }

    foreach($item in $enum) {
        if($item -is [System.IO.DirectoryInfo]) {
            $queue.Enqueue($item)
            continue
        }

        # `$item` is a `FileInfo` here, check its Length
        if($item.BaseName.Length -eq 22) {
            # skip this file if condition is met
            continue
        }

        # here you can use `.CopyTo` instead of `Copy-Item`:
        # public FileInfo CopyTo(string destFileName);
        # public FileInfo CopyTo(string destFileName, bool overwrite);

        try {
            # `$destination` needs to be defined beforehand and should always be an absolute path
            # if the folder structure needs to be preserved you also need to handle the folder creation here
            $item.CopyTo($destination)
        }
        catch {
            # error handling here
        }
    }
}
Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
1

To complement Mathias' helpful answer and Santiago's helpful answer with some background information and performance considerations:

  • It is indeed advisable to use a -Filter argument if feasible, as it filters at the source and returns only the objects of interest, which is much faster than returning all objects and performing filtering after the fact, in PowerShell.

    • Each PowerShell provider determines what kind of filters - if any - -Filter supports; any such filter is invariably a string (though it is up to the provider how to interpret the string.

    • In the case of the FileSystem provider, -Filter supports only a single, wildcard-based name pattern (e.g., '*.txt'), which, via .NET APIs, is ultimately passed through to platform-native APIs.

    • Notably, the wildcard "language" supported by these APIs is (a) less powerful than PowerShell's own wildcards available via the -Include parameter, for instance (they lack [...] to express character ranges and sets) and (b), on Windows, riddled with legacy quirks - see this answer for the gory details.

  • The .NET APIs that PowerShell uses under the covers also do not support open-ended filters such as by file size; that is, performing your desired filtering at the source is fundamentally unsupported.

    • Still, direct .NET API calls do offer a performance benefit:

      • PowerShell's cmdlets and pipeline generally incur overhead compared to direct .NET API calls, with a notable slowdown in the case of provider cmdlets coming from each output object getting decorated with instance-level ETS properties such as .PSPath, containing provider metadata.

      • Potentially speeding this up (and reducing memory load) in the future, by defining these properties at the type level via CodeProperty members rather than per-instance NoteProperty members, is the subject of GitHub issue #7501.

    • Alternatively, there are things you can do on the PowerShell side to improve performance as well, as discussed next.


Improving the performance of your PowerShell code:

  • Avoiding the pipeline and per-input-object cmdlet calls is key.

  • E.g, you can replace a Where-Object in a pipeline with the intrinsic .Where() method speeds up processing, albeit at the expense of memory consumption.

  • Instead of calling Copy-Item once for each input object, pipe directly to it; if you need to determine the destination location on a per-input-object basis, you can use a delay-bind script block:

    (Get-ChildItem $copyFilePath).Where({ $_.BaseName.Length -ne 22 }) |         
      Copy-Item -Destination $destination
    
  • If no per-input-object variation in the destination path is needed, you can further speed up processing by taking advantage of the fact that many file-processing cmdlets accept an array of input paths:

    Copy-Item `
      -LiteralPath (Get-ChildItem $copyFilePath).Where({ $_.BaseName.Length -ne 22 }).FullName `
      -Destination $destination
    
    • Note the use of member-access enumeration to directly obtain the .FullName property values from the individual elements of the collection returned by .Where().

    • In PowerShell (Core) 7+, you wouldn't even need ((...).FullName) anymore, because [System.IO.FileInfo] and [System.IO.DirectoryInfo] instances are now consistently represented by their .FullName property when stringified (see this answer for background information).

mklement0
  • 382,024
  • 64
  • 607
  • 775