6

I am writing a PowerShell program to analyse the content of 1900+ big XML configuration files (50000+ lines, 1.5Mb). Just for test I move 36 test files to my PC (Win 10; PS 5.1; 32GB RAM) and write quick script to test the speed of execution.

$TestDir = "E:\Powershell\Test"
$TestXMLs = Get-ChildItem $TestDir -Recurse -Include *.xml

foreach ($TestXML in $TestXMLs)
{
    [xml]$XML = Get-Content $TestXML
    (($XML.root.servers.server).Where{$_.name -eq "Server1"}).serverid
}

That is completed for 36 to 40 seconds. I done several tests with measure-command.

Then I tried workflow with foreach -paralell assuming that parallel loading of several files will give me more faster process.

Workflow Test-WF
{
    $TestDir = "E:\Powershell\Test"
    $TestXMLs = Get-ChildItem $TestDir -Recurse -Include *.xml

    foreach -parallel -throttle 10 ($TestXML in $TestXMLs)
    {
        [xml]$XML = Get-Content $TestXML
        (($TestXML.root.servers.server).Where{$_.name -eq "Sevrver1"}).serverid
    }
}

Test-WF #execute workflow

Script with the workflow needs between 118 and 132 seconds.

Now I am just wondering what could be the reason that workflow works so much slower? Recompiling to XMAL maybe or slower algorithm for loading XML files in WWF?

mklement0
  • 382,024
  • 64
  • 607
  • 775
autosvet
  • 869
  • 1
  • 9
  • 17
  • 4
    BTW you can make the actual loading of XML several times faster: `$xml = [xml]''; $xml.Load($TestXML)` – wOxxOm Jan 22 '17 at 23:07
  • Wow .... just test it between 818 and 889ms against 33s. I think that this just replace the need of workflow. Thanks a lot. – autosvet Jan 22 '17 at 23:29

1 Answers1

18

foreach -parallel is by far the slowest parallelization option you have with PowerShell, since Workflows are not designed for speed, but for long-running operations that can be safely interrupted and resumed.

The implementation of these safety mechanisms introduces some overhead, which is why your script is slower when run as a workflow.

If you want to optimize for execution speed, use runspaces instead:

$TestDir = "E:\Powershell\Test"
$TestXMLs = Get-ChildItem $TestDir -Recurse -Include *.xml

# Set up runspace pool
$RunspacePool = [runspacefactory]::CreateRunspacePool(1,10)
$RunspacePool.Open()

# Assign new jobs/runspaces to a variable
$Runspaces = foreach ($TestXML in $TestXMLs)
{
    # Create new PowerShell instance to hold the code to execute, add arguments
    $PSInstance = [powershell]::Create().AddScript({
        param($XMLPath)

        [xml]$XML = Get-Content $XMLPath
        (($XML.root.servers.server).Where{$_.name -eq "Server1"}).serverid
    }).AddParameter('XMLPath', $TestXML.FullName)

    # Assing PowerShell instance to RunspacePool
    $PSInstance.RunspacePool = $RunspacePool

    # Start executing asynchronously, keep instance + IAsyncResult objects
    New-Object psobject -Property @{
        Instance = $PSInstance
        IAResult = $PSInstance.BeginInvoke()
        Argument = $TestXML
    }
}

# Wait for the the runspace jobs to complete
while($Runspaces |Where-Object{-not $_.IAResult.IsCompleted})
{
    Start-Sleep -Milliseconds 500
}

# Collect the results
$Results = $Runspaces |ForEach-Object {
    $Output = $_.Instance.EndInvoke($_.IAResult)
    New-Object psobject -Property @{
        File = $TestXML
        ServerID = $Output
    }
}

Fast XML processing bonus tips:

As wOxxOm suggests, using Xml.Load() is way faster than using Get-Content to read in the XML document.

Furthermore, using dot notation ($xml.root.servers.server) and the Where({}) extension method is also going to be painfully slow if there are many servers or server nodes. Use the SelectNodes() method with an XPath expression to search for "Server1" instead (be aware that XPath is case-sensitive):

$PSInstance = [powershell]::Create().AddScript({
    param($XMLPath)

    $XML = New-Object Xml
    $XML.Load($XMLPath)
    $Server1Node = $XML.SelectNodes('/root/servers/server[@name = "Server1"]')
    return $Server1Node.serverid
}).AddParameter('XMLPath', $TestXML.FullName)
Community
  • 1
  • 1
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • Actually I need to get 15 or more element values from every xml + file path etc. and export them into csv for analysis and records. But will test your solution as well. Thanks you very much. – autosvet Jan 22 '17 at 23:35
  • 1
    @autosvet You can still do that using the above approach, just edit the scriptblock inside `AddScript()` accordingly. Export the `$Results` variable to CSV when done – Mathias R. Jessen Jan 22 '17 at 23:36
  • Yep. I will try because I am curious to see difference in times. Thanks again. – autosvet Jan 22 '17 at 23:39
  • @MathiasR.Jessen: I tried to generalize your code, see https://stackoverflow.com/questions/52975186/how-to-pass-psitem-in-a-scriptblock . – fjf2002 Oct 26 '18 at 11:11