1

i am trying to search through a number of large files and replace parts of the text, but i keep running into errors.

i tried this, but sometimes i'll get an 'out of memory' error in powershell

#region The Setup
$file = "C:\temp\168MBfile.txt"

$hash = @{
    ham = 'bacon'
    toast = 'pancakes'
}
#endregion The Setup

$obj = [System.IO.StreamReader]$file
$contents = $obj.ReadToEnd()
$obj.Close()

foreach ($key in $hash.Keys) {
    $contents = $contents -replace [regex]::Escape($key), $hash[$key]
}
try {
    $obj = [System.IO.StreamWriter]$file
    $obj.Write($contents)
} finally {
    if ($obj -ne $null) {
        $obj.Close()
    }
}

then i tried this (in the ISE), but it crashes with a popup message (sorry, don't have the error on hand) and tries to restart the ISE

$arraylist = New-Object System.Collections.ArrayList
$obj = [System.IO.StreamReader]$file
while (!$obj.EndOfStream) {
    $line = $obj.ReadLine()
    foreach ($key in $hash.Keys) {
        $line = $line -replace [regex]::Escape($key), $hash[$key]
    }
    [void]$arraylist.Add($line)
}
$obj.Close()
$arraylist

and finally, i came across something like this, but i'm not sure how to use it properly, and i am not even sure if i am going about this the right way.

$sourcestream = [System.IO.File]::Open($file)
$newstream = [System.IO.File]::Create($file)
$sourcestream.Stream.CopyTo($newstream)
$sourcestream.Close()

any advice would be greatly appreciated.

Anthony Stringer
  • 1,981
  • 1
  • 10
  • 15

1 Answers1

0

You can start with a readcount of 1000 and tweak it based on the performance you get:

get-content textfile -Readcount 1000 | 
    foreach-object {do something} | 
    set-content textfile

or

(get-content textfile -Readcount 1000) -replace 'something','withsomething' | 
set-content textfile
Kiran Reddy
  • 2,836
  • 2
  • 16
  • 20
  • Okay, so how do I get any lines after I'm done with the first 1000? – Anthony Stringer Apr 06 '16 at 03:04
  • from the help file : `-ReadCount Specifies how many lines of content are sent through the pipeline at a time. The default value is 1. A value of 0 (zero) sends all of the content at one time. This parameter does not change the content displayed, but it does affect the time it takes to display the content. As the value of ReadCount increases, the time it takes to return the first line increases, but the total time for the operation decreases. This can make a perceptible difference in very large items` – Kiran Reddy Apr 06 '16 at 03:10
  • Does your example work if I'm trying to write to the same file I'm reading from? – Anthony Stringer Apr 06 '16 at 11:12
  • No it won't and neither will streams. To write to the source file you need to first read the whole file to memory. You can however write to a temp file and replace the source file when your done. – Frode F. Apr 06 '16 at 14:32
  • yup as @FrodeF. mentioned writing to the source while it is still being read is not possible. Using a temp file is a much safer option because that way your source is untouched.once you are satisfied with the results you could use `remove-item` to remove the source and perhaps rename the newfile – Kiran Reddy Apr 07 '16 at 02:00
  • I'm still not getting it to work right. out of memory error. the 168mb file has no carriage returns or line feeds - it's all a single line. is there any way to only read so many characters or bytes at a time? – Anthony Stringer Apr 08 '16 at 18:00
  • none that i am aware of but take a look at this thread - `http://stackoverflow.com/questions/4192072/how-to-process-a-file-in-powershell-line-by-line-as-a-stream`.....also perhaps you could explore other programming languages that are tailored more towards text processing.... – Kiran Reddy Apr 09 '16 at 08:29