20

I have a need to read in a 1gb raw text file from disk to ram to do some string manipulation in C#.

string contents = File.ReadAllText(path)

is throwing out of memory exceptions (unsurprisingly)

What is the best way to go about this?

Luke Belbina
  • 5,708
  • 12
  • 52
  • 75
  • 4
    What kind of string manipulation? Would it be okay to only read some parts at any given time? – Lasse Espeholt May 09 '11 at 22:10
  • In theory yes, but working w/ legacy code and I know the environment this is going to be used in and it would be easier to read it in one go. – Luke Belbina May 09 '11 at 22:12
  • I assume you actually have enough free RAM on the PC that you are attempting this with. I know modifying legacy code can be a pain (and scary aswell if its mission critical), but you may need to consider just reading a chunk at a time and working with it in that way. – Sean Hunter May 09 '11 at 22:19

5 Answers5

14

Possibly also look at using a memory-mapped file

Dave
  • 1,338
  • 12
  • 17
  • From the docs it looks like you'd use a `MemoryMappedViewStream` and then pull in some chunks of bytes. Use `Encoding.GetString` [ http://msdn.microsoft.com/en-us/library/05cts4c3.aspx ] if necessary. – Dave May 13 '11 at 14:25
11

If you REALLY want to do this huge string manipulation in memory then you are NOT out of luck anymore, provided you can meet the following requirements

  1. Compile targeting x64
  2. Run in a x64 system
  3. Target .NET 4.5

This will lift all the memory limitations you're facing. Your process memory will be limited only by your computer memory, and there is not a 2GiB limit on a single .NET object starting in .NET 4.5 for x64.

Loudenvier
  • 8,362
  • 6
  • 45
  • 66
4

Try with System.IO.StreamReader

Any difference between File.ReadAllText() and using a StreamReader to read file contents?

Community
  • 1
  • 1
manojlds
  • 290,304
  • 63
  • 469
  • 417
2

I was using ReadAllText() for a 109 MB file and was getting out of memory which is really odd. I used buffer to read files with good performance and StringBuilder to make it memory efficient. Here is my code:

StringBuilder sb = new StringBuilder();

using (FileStream fs = File.Open(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
    string line;                    
    while ((line = sr.ReadLine()) != null)
        sb.AppendLine(line);
}
spaleet
  • 838
  • 2
  • 10
  • 23
techExplorer
  • 810
  • 7
  • 16
  • 1
    Stringbuilder also throws out of memory exceptions at that size. See [this question](https://stackoverflow.com/q/1769447/345659) – JumpingJezza Mar 20 '18 at 07:24
  • Don't use StreamReader with BufferedStream! Using BufferedStream is redundant. Because the buffering mechanism have already been implemented in .Net several years ago. You end up having two buffers. Look at this https://stackoverflow.com/a/2069317/81306 – Kamran Bigdely Mar 29 '18 at 23:05
0

If others suggested solution do not work, I suggest you setting a limit of characters to read, and read the text by parts. Once you cache a part of the text, you can manipulate it.

If you need to manipulate it in any direction (I mean, not from left to right in one step), you can always implement a B-Tree and store parts of the text in the nodes :)

Sometimes it is almost impossible to work reading a text by parts sequentially, and here's where a B-Tree helps. I implemented it about one year ago for academic purposes (a mini-database manager), but I think there should be implementations of it in C#. Of course, you will have to implement how to load the nodes of the BTree from the file.

Oscar Mederos
  • 29,016
  • 22
  • 84
  • 124