0

I have a code which reads text from 10 files each of the size of approximately 80MB. However, I am unable to do this successfully as (depending on the way I tried) it fails on 3rd-7th iteration. The commented-out lines are the ways that I tried reading it, each of them fails.

var lines = new List<string>();
var text = string.Empty;
for (int i = 0; i < 10; i++)
{
    try
    {
        //lines.AddRange(File.ReadAllLines(dirPath + string.Format(@"commands{0}.txt", i)));
        //lines.Add(File.ReadAllText(dirPath + string.Format(@"commands{0}.txt", i)));
        //lines.Add(text);

        var bytes = File.ReadAllBytes(dirPath + string.Format(@"commands{0}.txt", i));
        text += Environment.NewLine + System.Text.Encoding.UTF8.GetString(bytes);
    }
    catch (Exception e)
    {
        //OutOfMemory exception
    }
}

What am I doing wrong? What exactly gets capped? MB allowed for application, length of a string, count of items in a list? Etc.?

  • What are you doing with this `text` afterwards? It might be more efficient to store it as a byte array if you're just going to write it later, or to convert the byte array directly to a string all at once, rather than a file at a time. – Heretic Monkey Jan 20 '17 at 15:42
  • 1
    You are just plain out of memory. The question is too crude to provide an alternative, nobody can tell why you need to store so much text. But it is certainly a very old-fashioned problem. Project > Properties > Build tab and untick the "Prefer 32-bit" option. You don't prefer it. – Hans Passant Jan 20 '17 at 15:53

2 Answers2

1

The text is a string object which has a limit. What is the maximum possible length of a .NET string?

You can use StringBuilder which can grow beyond it's limit by appending to it. https://msdn.microsoft.com/en-us/library/system.text.stringbuilder.maxcapacity(v=vs.110).aspx

Add using System.Text first.

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 1000; i++)
{
    var bytes = File.ReadAllBytes(dirPath + string.Format(@"commands{0}.txt", i));
    sb.Append(Environment.NewLine + System.Text.Encoding.UTF8.GetString(bytes));
}
Community
  • 1
  • 1
Souvik Ghosh
  • 4,456
  • 13
  • 56
  • 78
  • 10x80MB = 800 MB , still much less than 2GB – H H Jan 23 '17 at 13:56
  • The OP should use a StringBuilder but for different reasons. – H H Jan 23 '17 at 14:07
  • You are right. I was just wondering by any chance the file combined content of the files are exceeding in some way or other. I was waiting if OP could revert back with the posted solutions. – Souvik Ghosh Jan 23 '17 at 15:21
1

The problem is in string text. string is immutable. It means that when you change string after creation the new string object will be created.

Doing this:

text += Environment.NewLine + System.Text.Encoding.UTF8.GetString(bytes);

you create object in every iteration (even more than one object - Environment.NewLine + System.Text.Encoding.UTF8.GetString(bytes); creates one object and then you do text += creates one more object).

Assume you have read first file and then append this string by text from second - there will be stored old string containing text from file and new string that contains text from two files in memory. Old string are stored but not needed.

There are a lot of memory that is not needed, but Garbage Collection hasn't been done yet (that's why sometimes you get exception in 3rd iteration, sometimes in 7th - if GC occurs you go further).

To avoid this consider using byte array or StringBuilder instead of string.

Regarding List<string>:

Internally list holds an array and when there is no sequential(contiguous) region of memory for allocation for this array you will get OutOfMemoryException too.

You can try use LinkedList<string> instead.

Using StringBuilder:

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 10; i++)
{
    try
    {
        var bytes = File.ReadAllBytes(dirPath + string.Format(@"commands{0}.txt", i));

        sb.Append(Environment.NewLine);
        sb.Append(System.Text.Encoding.UTF8.GetString(bytes));    

        //avoid sb.Append(Environment.NewLine + System.Text.Encoding.UTF8.GetString(bytes)) 
        //because you still create unnecessary object doing concatenation (+)    
    }
    catch (Exception e)
    {
        //OutOfMemory exception
    }
}

//you can cast "sb" to "string"
string res = sb.ToString();

But you should consider creating another solution. Holding 800MB in memory is not the best one.

Roman
  • 11,966
  • 10
  • 38
  • 47