I'm working with some multi-gigabyte text files and want to do some stream processing on them using PowerShell. It's simple stuff, just parsing each line and pulling out some data, then storing it in a database.
Unfortunately, get-content | %{ whatever($_) }
appears to keep the entire set of lines at this stage of the pipe in memory. It's also surprisingly slow, taking a very long time to actually read it all in.
So my question is two parts:
- How can I make it process the stream line by line and not keep the entire thing buffered in memory? I would like to avoid using up several gigs of RAM for this purpose.
- How can I make it run faster? PowerShell iterating over a
get-content
appears to be 100x slower than a C# script.
I'm hoping there's something dumb I'm doing here, like missing a -LineBufferSize
parameter or something...