17

I want to read big TXT file size is 500 MB, First I use

var file = new StreamReader(_filePath).ReadToEnd();  
var lines = file.Split(new[] { '\n' });

but it throw out of memory Exception then I tried to read line by line but again after reading around 1.5 million lines it throw out of memory Exception

  using (StreamReader r = new StreamReader(_filePath))
         {            
             while ((line = r.ReadLine()) != null)            
                 _lines.Add(line);            
         }

or I used

  foreach (var l in File.ReadLines(_filePath))
            {
                _lines.Add(l);
            }

but Again I received

An exception of type 'System.OutOfMemoryException' occurred in mscorlib.dll but was not handled in user code

My Machine is powerful machine with 8GB of ram so it shouldn't be my machine problem.

p.s: I tried to open this file in NotePadd++ and I received 'the file is too big to be opened' exception.

Ben
  • 1,196
  • 2
  • 11
  • 25
  • 2
    What is the question? You're only describing things. – Alvin Wong Nov 16 '12 at 11:46
  • 1
    What is the point of storing all that in a collection? – CyberDude Nov 16 '12 at 11:46
  • @AlvinWong problem is why I receiving outofMemory Exception, an how I can solve it – Ben Nov 16 '12 at 11:47
  • 1
    You talk about "500 rows" but how big is the file in terms of *bytes* and *characters*? 500 rows of 80 characters shouldn't be a problem - 500 line of a billion characters per line clearly is. – Jon Skeet Nov 16 '12 at 11:48
  • @CyberDude then I used the list on the other part of application – Ben Nov 16 '12 at 11:48
  • @JonSkeet sorry file-size is 500 MB – Ben Nov 16 '12 at 11:53
  • 1
    @Behnam - 500 GB file in 8GB memory? That will not fit. You will need to find some other way of processing the file, that does not requires the whole thing in memory. Process per line (or maybe per small amount of lines) – Hans Kesting Nov 16 '12 at 11:56
  • 1
    You can fix it by **not** loading the entire file in memory. Clearly your design is not suited to the needs of the application. What will you eventually do with that data? Any processing, filtering, etc? Maybe you need to store it in a database first. – CyberDude Nov 16 '12 at 11:58
  • @CyberData even if I remove _lines.Add(line); and just reading the file it create OutOfMemoryException, so what's your suggestion to just read the file. – Ben Nov 16 '12 at 12:01

6 Answers6

39

Just use File.ReadLines which returns an IEnumerable<string> and doesn't load all the lines at once to the memory.

foreach (var line in File.ReadLines(_filePath))
{
    //Don't put "line" into a list or collection.
    //Just make your processing on it.
}
L.B
  • 114,136
  • 19
  • 178
  • 224
  • Same problem even if just use empty loop foreach (var line in File.ReadLines(_filePath)) { } – Ben Nov 16 '12 at 12:36
  • @Behnam are you sure that you are not getting this error from other parts of your program. Try this in an empty solution. – L.B Nov 16 '12 at 12:42
  • I just created a Console application which just a line of code foreach (var line in File.ReadLines(_filePath)) { },but it create exception again. – Ben Nov 16 '12 at 12:53
  • 2
    @Behnam I just tested it with 8.7GB text file(120,000,000 lines) and worked well. – L.B Nov 16 '12 at 13:05
4

The cause of exception seem to be growing _lines collection but not reading big file. You are reading line and adding to some collection _lines which will be taking memory and causing out of memory execption. You can apply filters to only put the required lines to _lines collection.

Adil
  • 146,340
  • 25
  • 209
  • 204
3

I know this is an old post but Google sent me here in 2021..

Just to emphasize igrimpe's comments above:

I've run into an OutOfMemoryException on StreamReader.ReadLine() recently looping through folders of giant text files.

As igrimpe mentioned, you can sometimes encounter this where your input file exhibits a lack of uniformity in line breaks. If you are looping through a textfile and encounter this, double check your input file for unexpected characters / ascii encoded hex or binary strings, etc.

In my case, I split the 60 gb problematic file into 256mb chunks, had my file iterator stash the problematic textfiles as part of the exception trap and later remedied the problem textfiles by removing the problematic lines.

BillD
  • 84
  • 2
1

Edit:

loading the whole file in memory will be causing objects to grow, and .net will throw OOM exceptions if it cannot allocate enough contiguous memory for an object.

The answer is still the same, you need to stream the file, not read the entire contents. That may require a rearchitecture of your application, however using IEnumerable<> methods you can stack up business processes in different areas of the applications and defer processing.


A "powerful" machine with 8GB of RAM isn't going to be able to store a 500GB file in memory, as 500 is bigger than 8. (plus you don't get 8 as the operating system will be holding some, you can't allocate all memory in .Net, 32-bit has a 2GB limit, opening the file and storing the line will hold the data twice, there is an object size overhead....)

You can't load the whole thing into memory to process, you will have to stream the file through your processing.

cjk
  • 45,739
  • 9
  • 81
  • 112
  • In my second approach I tried to use StreamReader and even with removing _lines.Add(line); line, I am receiving OutOfMemoryException. so I don't clearly understand what do you mean by streaming. – Ben Nov 16 '12 at 12:22
  • Maybe the "line" terminator is not what it should be? If the lines are not terminated by \r AND \n the internal functions probably would still read the complete file into memory, wouldn't they? – igrimpe Nov 16 '12 at 12:48
  • I'm not sure why you received an error in your 2nd code excerpt when not calling `_lines.Add(line)`, maybe you have a problem elsewhere? The line terminator is likely to not be related to the problem - 500MB of contiguous memory is going to be difficult to obtain in any scenario unless you are running 64-bit and have a LOT of memory. – cjk Nov 16 '12 at 12:54
  • Testing if the line terminator is the problem should be easy. Do a console app with a single method `file.readline(path)`. If it still throws an ex, then a single "line" simply is too long. Most likely because internally a stringbuilder is used which permanently has to increase its internal array (i.e. allocate space for a NEW one), without giving the GC time to clean up. – igrimpe Nov 16 '12 at 13:02
  • @igrimpe Good call on that one – cjk Nov 16 '12 at 13:45
0

You have to count the lines first. It is slower, but you can read up to 2,147,483,647 lines.

int intNoOfLines = 0;
using (StreamReader oReader = new 
StreamReader(MyFilePath))
{
    while (oReader.ReadLine() != null) intNoOfLines++;
}
string[] strArrLines = new string[intNoOfLines];
int intIndex = 0;
using (StreamReader oReader = new 
StreamReader(MyFilePath))
{
    string strLine;
    while ((strLine = oReader.ReadLine()) != null)
    {
       strArrLines[intIndex++] = strLine;
    }
}
0

For anyone else having this issue:

If you're running out of memory while using StreamReader.ReadLine(), I'd be willing to bet your file doesn't have multiple lines to begin with. You're just assuming it does. It's an easy mistake to make because you can't just open a 10GB file with Notepad.

One time I received a 10GB file from a client that was supposed to be a list of numbers and instead of using '\n' as a separator, he used commas. The whole file was a single line which obviously caused ReadLine() to blow up.

Try reading a few thousand characters from your stream using StreamReader.Read() and see if you can find a '\n'. Odds are you won't.

royalstream
  • 143
  • 7