5

First time asker here. Please be kind :)

I'm attempting to recursively get all directories in a parallel manner in hopes of decreasing the time it takes to traverse through a drive. Below is the code I've tried. Essentially what I'm looking to do is input a folder and do the same in parallel for it's subfolder and their subfolders and so on, but the function is not recognized inside the parallel block

function New-RecursiveDirectoryList {
    [CmdletBinding()]
    param (
        # Specifies a path to one or more locations.
        [Parameter(Mandatory = $true,
            Position = 0,
            ValueFromPipeline = $true,
            ValueFromPipelineByPropertyName = $true,
            HelpMessage = 'Path to one or more locations.')]
        [Alias('PSPath')]
        [ValidateNotNullOrEmpty()]
        [string[]]
        $Path
    )
    process {
        foreach ($aPath in $Path) {
            Get-Item $aPath

            Get-ChildItem -Path $aPath -Directory |
                # Recursively call itself in Parallel block not working
                # Getting error "The term 'New-RecursiveDirectoryList' is not recognized as a name of a cmdlet"
                # Without -Parallel switch this works as expected
                ForEach-Object -Parallel {
                    $_ | New-RecursiveDirectoryList
                }
        }
    }
}

Error:

New-RecursiveDirectoryList: 
Line |
   2 |                      $_ | New-RecursiveDirectoryList
     |                           ~~~~~~~~~~~~~~~~~~~~~~~~~~
     | The term 'New-RecursiveDirectoryList' is not recognized as a name of a cmdlet, function, script file, or executable program.
Check the spelling of the name, or if a path was included, verify that the path is correct and try again.

I've also attempted to use the solution provided by mklement0 here but no luck. Below is my attempt at this:

    function CustomFunction {
    [CmdletBinding()]
    param (
        # Specifies a path to one or more locations.
        [Parameter(Mandatory = $true,
            Position = 0,
            ValueFromPipeline = $true,
            ValueFromPipelineByPropertyName = $true,
            HelpMessage = 'Path to one or more locations.')]
        [Alias('PSPath')]
        [ValidateNotNullOrEmpty()]
        [string[]]
        $Path
    )

    begin {
        # Get the function's definition *as a string*
        $funcDef = $function:CustomFunction.ToString()
    }

    process {
        foreach ($aPath in $Path) {
            Get-Item $aPath

            Get-ChildItem -Path $aPath -Directory |
                # Recursively call itself in Parallel block not working
                # Getting error "The term 'New-RecursiveDirectoryList' is not recognized as a name of a cmdlet"
                # Without -Parallel switch this works as expected
                ForEach-Object -Parallel {
                    $function:CustomFunction = $using:funcDef
                    $_ | CustomFuction
                }
        }
    }
}

Error

CustomFuction: 
Line |
   3 |                      $_ | CustomFuction
     |                           ~~~~~~~~~~~~~
     | The term 'CustomFuction' is not recognized as a name of a cmdlet, function, script file, or executable program.
Check the spelling of the name, or if a path was included, verify that the path is correct and try again.

Does anybody know how this may be accomplished or a different way of doing this?

mklement0
  • 382,024
  • 64
  • 607
  • 775
Daniel
  • 4,792
  • 2
  • 7
  • 20
  • 1
    Hey Daniel, hope all is well, I still remember this question and had it planned to develop something similar but without multi-threading and classic recursion. In case you're interested this is a [tree like cmdlet for PowerShell](https://github.com/santysq/PSTree). – Santiago Squarzon Dec 23 '21 at 23:53
  • Just to spell out the root cause of your problem: When you use `ForEach-Object -Parallel` (PS v7+), each parallel thread is its own PowerShell runspace, which doesn't share any state with the caller's runspace. Therefore, your `CustomFunction` isn't available inside the `ForEach-Object -Parallel` script block. [GitHub issue #12240](https://github.com/PowerShell/PowerShell/issues/12240) proposes a future option to copy the caller's state to the parallel runspaces. – mklement0 Jul 18 '23 at 13:11
  • 1
    The only reason your second attempt didn't work was a _typo_: `CustomFuction` -> `CustomFunction` (missing `n`). Santiago's answer in essence employs the same technique, except by using anonymous script blocks instead of named functions. – mklement0 Jul 18 '23 at 13:16
  • 1
    @mklement0 can't believe it was due to a typo I didnt even see it at that time! anyways, I have updated my answer for something much better – Santiago Squarzon Jul 18 '23 at 14:11

3 Answers3

4

So, this worked for me, it obviously doesn't look pretty. One thing to note, the foreach ($aPath in $Path) {...} on your script is unnecessary, the process {...} block will handle that for you when you pass multiple paths.

Code:

function Test {
    [CmdletBinding()]
    param (
        # Specifies a path to one or more locations.
        [Parameter(
            Mandatory,
            ParameterSetName = 'LiteralPath',
            ValueFromPipelineByPropertyName,
            Position = 0)]
        [Alias('PSPath')]
        [string[]] $LiteralPath
    )

    begin {
        $scriptblock = $MyInvocation.MyCommand.ScriptBlock.ToString()
    }

    process {
        # Get-Item $Path <= This will slow down the script
        $LiteralPath | Get-ChildItem -Directory | ForEach-Object -Parallel {
            $sb = $using:scriptblock
            $def = [scriptblock]::Create($sb)
            $_ # You can do this instead
            $_ | & $def
        }
    }
}

Looking back at this answer, what I would recommend today is to not use recursion and use a ConcurrentStack<T> instead, this would be miles more efficient and consume less memory. Also worth noting, as mklement0 pointed out in his comment, your code was correct to begin with, the issue was due to a typo: $_ | CustomFuction -> $_ | CustomFunction.

function Test {
    [CmdletBinding()]
    param (
        [Parameter(
            Mandatory,
            ParameterSetName = 'LiteralPath',
            ValueFromPipelineByPropertyName,
            Position = 0)]
        [Alias('PSPath')]
        [string[]] $LiteralPath,

        [Parameter()]
        [ValidateRange(1, 64)]
        [int] $ThrottleLimit = 5
    )

    begin {
        $stack = [System.Collections.Concurrent.ConcurrentStack[System.IO.DirectoryInfo]]::new()
        $dir = $null
    }

    process {
        $stack.PushRange($LiteralPath)
        while ($stack.TryPop([ref] $dir)) {
            $dir | Get-ChildItem -Directory | ForEach-Object -Parallel {
                $stack = $using:stack
                $stack.Push($_)
                $_
            } -ThrottleLimit $ThrottleLimit
        }
    }
}
Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
  • Ah, this works perfectly! ..and also terribly :) To no fault of yours though. Spawning off all these processes for every folder just isn't very efficient I guess. I do really like the 2nd version with the level of nesting. Very nice touch. I was also playing with scriptblock myself but couldn't figure it out. `$scriptblock = $MyInvocation.MyCommand.ScriptBlock.ToString()` was the missing piece for me. Thank you for sharing this and solving my problem. It's good to know how to bring the calling function into a parallel block for future reference. – Daniel May 05 '21 at 05:07
  • @Daniel Check out my last edit, your question was an awesome exercise so thank you :) By the way, I was gonna add, if you're gonna run this recursively against a very big share / drive you probably will want to add a `-ThrottleLimit` to your `foreach-object` – Santiago Squarzon May 05 '21 at 05:11
  • @Daniel Hierarchy doesn't look very good on Linux tho, but if you want to see how the real function for AD Groups works checkout my [GitHub](https://github.com/santysq/Get-Hierarchy) – Santiago Squarzon May 05 '21 at 05:14
1

I did something similar at the time. I did it using non recursive function but with RunSpace from DotNet. For it, you will need to install PoshRsJob module and create a list of subfolder to extract in dir.txt. then you run this:

Install-Module PoshRsJob -Scope CurrentUser
function ParallelDir {
    param (
        $Folders,
        $Throttle = 8
    )
    $batch = 'ParallelDir'
    $jobs = Get-RSJob -Batch $batch
    if ($jobs | Where-Object State -eq 'Running') {
        Write-Warning ("Some jobs are still running. Stop them before running this job.
        > Stop-RSJob -Batch $batch")
        return
    }

    $Folders | Start-RSJob -Throttle $Throttle -Batch $batch -ScriptBlock {
        Param ($fullname)
        $name = Split-Path -Path $fullname -Leaf
        Get-ChildItem $fullname -Recurse | Select-Object * | Export-Clixml ('c:\temp\{0}.xml' -f $name)
    } | Wait-RSJob -ShowProgress | Receive-RSJob

    if (!(Get-RSJob -Batch $batch | Where-Object {$_.HasErrors -and $_.Completed})) {
        Remove-RSJob -Batch $batch
    } else {
        Write-Warning ("The copy process has finished with ERROR. You can check:
        > Get-RsJob -Batch $batch
        To consolidate the results from each copy run:
        > Get-ChildItem 'c:\temp\*.xml' | Import-Clixml")
    }
}
$dir = gc .\dir.txt
ParallelDir -Folders $dir
dir c:\temp\*.xml|Import-Clixml|select name,length
PollusB
  • 1,726
  • 2
  • 22
  • 31
0

From what I am seeing you coded a mandatory parameter, which means that you need to call it when you run your function. For example, in your case, you can try to manually run the selected lines in memory. To do so, open a PowerShell session and simply copy/paste the code you posted here. Once the code is loaded into memory, you can then call the function:

CustomFunction -Path "TheTargetPathYouWant"
  • 1
    Thanks for the answer raDiaSm0 although I don't think that's the problem. The mandatory path argument is being passed in from the pipeline. The problem is that the function itself cannot be called from within the parallel scriptblock as displayed in the error. I'm wondering if there is a way to pass the function definition into the parallel block so that it can be used. – Daniel May 05 '21 at 02:41