0

Question, if anyone could help please: The file I am reading from inPath is very large 300MB to 1 GB +. I need to load the file into the variable wholeFile as shown in the below program. Approximately 200 MB files works fine but larger files bomb out (Out of Memory Exception Error). The purpose is once file is loaded into the variable, I would need to run RegEx and pick certain section of the file and save somewhere else. Thanks once again for your kind attention.

Dim inPath As String = "C:\temp\300MB-File.txt" 
Dim outPath As String = "C:\temp\myFileNew2.txt"

        Dim wholeFile as String = ""

        Using sw As StreamWriter = File.CreateText(outPath)
            For Each oneLine As String In File.ReadLines(inPath)
                sw.WriteLine(oneLine)

                wholeFile = wholeFile & vbCrLf & oneLine

            Next 
    End Using
esote
  • 831
  • 12
  • 25
RDM
  • 11
  • 1
  • 5

1 Answers1

8

The way you're doing that is abominable. Why would you read a file line by line if your purpose is to store the entire contents in a single variable? Why wouldn't you load the whole file in one go?

Dim fileContents = File.ReadAllText(filePath)

That may still have memory issues with large files but the way you're doing will use exponentially more memory. Each time you do that concatenation to the String, you create a new String object and copy the previous contents into it along with the new text. That means that, for a file with N lines, you are going to create N Strings. The first will contain the first line, then the second will contain the first two lines, then the third will contain the first three lines, etc, etc.

If you really want to read the file line by line then you could use a StringBuilder, which avoids so much memory reallocation. Even better would be to get the size of the file first and then create the StringBuilder with the appropriate capacity from the get go, so no reallocation would be needed at all.

When you get right down to it though, files of that size are going to be an issue no matter what. You will either need to ensure that enough memory is allocated to your app to handle it or else you'll have to break the file up into chunks and process each chunk separately. If your regex won't match very large portions of the file then you can simply make each chunk overlap by a line or two and then handle the special cases where you get duplicate matches in the overlapping section.

jmcilhinney
  • 50,448
  • 5
  • 26
  • 46
  • In fairness to RDM, it's really [Mark's fault](http://stackoverflow.com/a/40515757/832052) – djv Nov 12 '16 at 03:33
  • @Verdolino, the advice provided by Mark in that other thread was sound for when processing a file line by line, as in the example provided when writing the lines to another file. If you're not actually processing the input file line by line then there's no point reading it line by line. That said, the code provided in the question does still write the lines out as they are read but that's absolutely pointless as it is in the code, so obviously the final code would either exclude that or include some additional processing. – jmcilhinney Nov 12 '16 at 04:06
  • @JoelCoehoorn, I have rolled back your edit because, while strictly correct, it made the next sentence incorrect. The N `Strings` I was referring to were the output and they are the ones that get copied and reallocated each time. You are correct that there will indeed be another N `Strings` that contain the input lines as well. While that's bad enough, at least that would only double memory usage rather than make it grow geometrically. – jmcilhinney Nov 12 '16 at 04:11
  • Thank you guys. Originally, I did have ReadToEnd, but when writing to file was going out of memory, the reason "Mark" suggested to go ReadLines option, which actually worked great. But the issue came up as I tried saving it to the variable. I will be rewriting some of the codes as suggested.--Thanks once again. – RDM Nov 14 '16 at 16:10
  • Hello Mark, Below coding to search the START_BLOCK, having trouble to cont. to read until END_BLOCK please help: Dim inPath As String = "C:\temprm\myFile.txt" Dim outPath As String = "C:\temprm\myFileNew1.txt" Using sw As StreamWriter = File.CreateText(outPath) For Each line As String In File.ReadLines(inPath) If line.Contains("START_BLOCK") Then sw.WriteLine(line) '-------HOW DO I CONTINUE UNTIL "END_BLOCK" AND WRITE TO outPath End If Next line End Using – RDM Nov 17 '16 at 19:27
  • Coding jumbled up. Mark please copy and paste into VB.NET – RDM Nov 17 '16 at 19:28
  • There's no Mark here. Don't post large code blocks in comments. If this relates to your current question then edit your original post and add the new code there. If it's a new question then create a new post. – jmcilhinney Nov 17 '16 at 22:07
  • Understood Thank you. – RDM Nov 18 '16 at 13:17