11

I created a PowerShell script which loops over a large number of XML Schema (.xsd) files, and for each creates a .NET XmlSchemaSet object, calls Add() and Compile() to add a schema to it, and prints out all validation errors.

This script works correctly, but there is a memory leak somewhere, causing it to consume gigabytes of memory if run on 100s of files.

What I essentially do in a loop is the following:

$schemaSet = new-object -typename System.Xml.Schema.XmlSchemaSet
register-objectevent $schemaSet ValidationEventHandler -Action {
    ...write-host the event details...
}
$reader = [System.Xml.XmlReader]::Create($schemaFileName)
[void] $schemaSet.Add($null_for_dotnet_string, $reader)
$reader.Close()
$schemaSet.Compile()

(A full script to reproduce this problem can be found in this gist: https://gist.github.com/3002649. Just run it, and watch the memory usage increase in Task Manager or Process Explorer.)

Inspired by some blog posts, I tried adding

remove-variable reader, schemaSet

I also tried picking up the $schema from Add() and doing

[void] $schemaSet.RemoveRecursive($schema)

These seem to have some effect, but still there is a leak. I'm presuming that older instances of XmlSchemaSet are still using memory without being garbage collected.

The question: How do I properly teach the garbage collector that it can reclaim all memory used in the code above? Or more generally: how can I achieve my goal with a bounded amount of memory?

2 Answers2

9

Microsoft has confirmed that this is a bug in PowerShell 2.0, and they state that this has been resolved in PowerShell 3.0.

The problem is that an event handler registered using Register-ObjectEvent is not garbage collected. In reponse to a support call, Microsoft said that

"we’re dealing with a bug in PowerShell v.2. The issue is caused actually by the fact that the .NET object instances are no longer released due to the event handlers not being released themselves. The issue is no longer reproducible with PowerShell v.3".

The best solution, as far as I can see, is to interface between PowerShell and .NET at a different level: do the validation completely in C# code (embedded in the PowerShell script), and just pass back a list of ValidationEventArgs objects. See the fixed reproduction script at https://gist.github.com/3697081: that script is functionally correct and leaks no memory.

(Thanks to Microsoft Support for helping me find this solution.)


Initially Microsoft offered another workaround, which is to use $xyzzy = Register-ObjectEvent -SourceIdentifier XYZZY, and then at the end do the following:

Unregister-Event XYZZY
Remove-Job $xyzzy -Force

However, this workaround is functionally incorrect. Any events that are still 'in flight' are lost at the time these two additional statements are executed. In my case, that means that I miss validation errors, so the output of my script is incomplete.

4

After the remove-variable you can try to force GC collection :

[GC]::Collect()
CB.
  • 58,865
  • 9
  • 159
  • 159
  • That does make the increase less, so this may help in practice; but the amount of memory used is still gradually increasing. – MarnixKlooster ReinstateMonica Jun 27 '12 at 09:45
  • As described in [another StackOverflow answer](http://stackoverflow.com/a/745965/223837) this is not directly possible in PowerShell, and it is not necessary because `Close()` implies a `Dispose()` as per recommended Microsoft convention. – MarnixKlooster ReinstateMonica Jun 27 '12 at 12:00
  • @MarnixKlooster I use a powershell.exe.config to load .net 4.0 framework and I can do $reader.dispose(). Sorry for bad information. – CB. Jun 27 '12 at 12:35
  • No problem-- could you please put a brief description how exactly you "use a powershell.exe.config to load .NET 4.0 framework" as an answer to this question: http://stackoverflow.com/q/745956/223837? Thanks! – MarnixKlooster ReinstateMonica Jun 27 '12 at 15:42
  • read here http://stackoverflow.com/questions/2094694/how-can-i-run-powershell-with-the-net-4-runtime – CB. Jun 27 '12 at 19:10