The way you have this structured the entire file contents have to be read into memory. Note: That reading a file into memory uses 3-4x the file size in RAM, which's documented here.
Without getting into .Net classes, particularly [System.IO.StreamReader]
, Get-Content
is actually very memory efficient, you just have to leverage the pipeline so you don't build up the data in memory.
Note: if you do decide to try StreamReader
, the article will give you some syntax clues. Moreover, that topic has been covered by many others on the web.
Get-ChildItem -Path "C:\temp" -Depth 2 -Filter *.csv |
ForEach-Object{
$CurrentFile = $_
$TmpFilePath = Join-Path $CurrentFile.Directory.FullName ($CurrentFile.BaseName + "_New" + $CurrentFile.Extension)
Get-Content $CurrentFile.FullName |
ForEach-Object{ $_ -replace "`0","" } |
Add-Content $TmpFilePath
# Now that you've got the new file you can rename it & delete the original:
Remove-Item -Path $CurrentFile.FullName
Rename-Item -Path $TmpFilePath -NewName $CurrentFile.Name
}
This is a streaming model, Get-Content
is streaming inside the outer ForEach-Object
loop. There may be other ways to do it, but I chose this so I could keep track of the names and do the file swap at the end...
Note: Per the same article, in terms of speed Get-Content
is quite slow. However, your original code was likely already suffering that burden. Moreover, you can speed it up a bit using the -ReadCount XXXX
parameter. That will send some number of lines down the pipe at a time. That of course does use more memory, so you'd have to find a level that helps you say within the boundaries of your available RAM. Performance improvement with -ReadCount
is mentioned in this answer's comments.
Update Based on Comments:
Here's an example of using StreamReader/Writer to perform the same operations from the previous example. This should be just as memory efficient as Get-Content
, but should be much faster.
Get-ChildItem -Path "C:\temp" -Depth 2 -Filter *.csv |
ForEach-Object{
$CurrentFile = $_.FullName
$CurrentName = $_.Name
$TmpFilePath = Join-Path $_.Directory.FullName ($_.BaseName + "_New" + $_.Extension)
$StreamReader = [System.IO.StreamReader]::new( $CurrentFile )
$StreamWriter = [System.IO.StreamWriter]::new( $TmpFilePath )
While( !$StreamReader.EndOfStream )
{
$StreamWriter.WriteLine( ($StreamReader.ReadLine() -replace "`0","") )
}
$StreamReader.Close()
$StreamWriter.Close()
# Now that you've got the new file you can rename it & delete the original:
Remove-Item -Path $CurrentFile
Rename-Item -Path $TmpFilePath -NewName $CurrentName
}
Note: I have some sense this issue is rooted in encoding. The Stream constructors do accept an encoding enum as an argument.
Available Encodings:
[System.Text.Encoding]::BigEndianUnicode
[System.Text.Encoding]::Default
[System.Text.Encoding]::Unicode
[System.Text.Encoding]::UTF32
[System.Text.Encoding]::UTF7
[System.Text.Encoding]::UTF8
So if you wanted to instantiate the streams with, for example, UTF8:
$StreamReader = [System.IO.StreamReader]::new( $CurrentFile, [System.Text.Encoding]::UTF8 )
$StreamWriter = [System.IO.StreamWriter]::new( $TmpFilePath, [System.Text.Encoding]::UTF8 )
The streams do default to UTF8. I think the system default is typically code page Windows 1251.