2

I currently have the following line of code.

(Get-Content 'file.txt') |
  ForEach-Object {$_ -replace '"', ''} |
  Set-Content 'file.txt'

This worked when testing, but now I am trying to use it on real data files (13 GB) and this process of using Get-Content is causing Powershell to consume a large amount of RAM and ultimately all of the available RAM on the machine.

Is there a better way that I can achieve the same result without the same amount of overhead?

Seems I am doing the opposite of best practice but not sure what else would be cleaner/ less RAM intensive than the above.

Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
ricky89
  • 1,326
  • 6
  • 24
  • 37
  • The title wouldn't make it obvious unless you already knew the method for solving this problem, but it is a duplicate. http://stackoverflow.com/questions/4192072/how-to-process-a-file-in-powershell-line-by-line-as-a-stream – EBGreen Sep 01 '15 at 16:47
  • 1
    possible duplicate of [How to process a file in PowerShell line-by-line as a stream](http://stackoverflow.com/questions/4192072/how-to-process-a-file-in-powershell-line-by-line-as-a-stream) – EBGreen Sep 01 '15 at 16:47

3 Answers3

6

Use a stream to read the file, then it won't put it all into memory, you can also use a stream to write the output. This should perform pretty well, and keep memory usage down:

$file = New-Object System.IO.StreamReader -Arg "c:\test\file.txt"
$outstream = [System.IO.StreamWriter] "c:\test\out.txt"

while ($line = $file.ReadLine()) {
  $s = $line -replace '"', ''
  $outstream.WriteLine($s)
}
$file.close()
$outstream.close()
campbell.rw
  • 1,366
  • 12
  • 22
5

Your problem isn't caused by Get-Content, but by the fact that you're running the statement in an expression (i.e. in parentheses). Running Get-Content like that is a convenient way of allowing a pipeline to write data back to the same file. However, the downside of this approach is that the entire file is read into memory before the data is passed into the pipeline (otherwise the file would still be open for reading when Set-Content tries to write data back to it).

For processing large files you must remove the parentheses and write the output to a temporary file that you rename afterwards.

Get-Content 'C:\path\to\file.txt' |
  ForEach-Object {$_ -replace '"', ''} |
  Set-Content 'C:\path\to\temp.txt'

Remove-Item 'C:\path\to\file.txt'
Rename-Item 'C:\path\to\temp.txt' 'file.txt'

Doing this avoids the memory exhaustion you observed. The processing can be sped up further by increasing the read count as @mjolinor suggested (cut execution time down to approximately 40% in my tests).

For even better performance use the approach with a StreamReader and a StreamWriter that @campbell.rw suggested:

$reader = New-Object IO.StreamReader 'C:\path\to\file.txt'
$writer = New-Object IO.StreamWriter 'C:\path\to\temp.txt'

while ($reader.Peek() -ge 0) {
  $line = $reader.ReadLine().Replace('"', '')
  $writer.WriteLine($line)
}

$reader.Close(); $reader.Dispose()
$writer.Close(); $writer.Dispose()

Remove-Item 'C:\path\to\file.txt'
Rename-Item 'C:\path\to\temp.txt' 'file.txt'
Community
  • 1
  • 1
Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
  • 2
    The .peek() method for testing whether your at EOF doesn't have the problem of ending the loop if it hits a null line. – mjolinor Sep 01 '15 at 21:37
  • This is the correct answer, as the accepted one does not take into account blank lines. – kuujinbo Feb 04 '17 at 21:29
1

This should be faster than line-by-line processing, and still keep your memory consumption under control:

Get-content 'file.txt' -ReadCount 5000 |
 foreach-object {$_ -replace '"', '' | 
 add-content 'newfile.txt' }
mjolinor
  • 66,130
  • 7
  • 114
  • 135
  • Hi mjolinor, Thanks for the suggestion. I tried it set to 5000 to begin with and a few other times with the Readcount set lower but powershell crashed each time. Get-content doesn't appear suited to the situation. – ricky89 Sep 01 '15 at 19:41
  • I've used this many times to read large files with good results. Define "crashed". – mjolinor Sep 01 '15 at 19:45
  • The memory usage kept rising (albeit slower than what I initially had) and eventually powershell became unresponsive and closed down. – ricky89 Sep 01 '15 at 19:53
  • Sorry about that. I closed the foreach block too early. See if it works better now. – mjolinor Sep 01 '15 at 20:03