2

In a Windows Forms C# app, I have a textbox where users paste log data, and it sorts it. I need to check each line individualy so I split the input by the new line, but if there are a lot of lines, greater than 100,000 or so, it throws a OutOfMemoryException.

My code looks like this:

StringSplitOptions splitOptions = new StringSplitOptions();
if(removeEmptyLines_CB.Checked)
    splitOptions = StringSplitOptions.RemoveEmptyEntries;
else
    splitOptions = StringSplitOptions.None;

List<string> outputLines = new List<string>();

foreach(string line in input_TB.Text.Split(new string[] { "\r\n", "\n" }, splitOptions))
{
    if(line.Contains(inputCompare_TB.Text))
        outputLines.Add(line);
}
output_TB.Text = string.Join(Environment.NewLine, outputLines);

The problem comes from when I split the textbox text by line, here input_TB.Text.Split(new string[] { "\r\n", "\n" }

Is there a better way to do this? I've thought about taking the first X amount of text, truncating at a new line and repeat until everything has been read, but this seems tedious. Or is there a way to allocate more memory for it?

Thanks, Garrett

Update

Thanks to Attila, I came up with this and it seems to work. Thanks

StringReader reader = new StringReader(input_TB.Text);
string line;
while((line = reader.ReadLine()) != null)
{
    if(line.Contains(inputCompare_TB.Text))
        outputLines.Add(line);
}
output_TB.Text = string.Join(Environment.NewLine, outputLines);
Garrett Fogerlie
  • 4,450
  • 3
  • 37
  • 56

5 Answers5

3

Split will have to duplicate the memory need of the original text, plus overhead of string objects for each line. If this causes memory issues, a reliable way of processing the input is to parse one line at a time.

Attila
  • 28,265
  • 3
  • 46
  • 55
  • Thanks, take a look at my update and let me know if that is what you meant. I will mark this as answered soon, I just want to see a couple other ideas. Thanks again! – Garrett Fogerlie Apr 30 '12 at 12:10
3

The better way to do this would be to extract and process one line at a time, and use a StringBuilder to create the result:

StringBuilder outputTxt = new StringBuilder();
string txt = input_TB.Text;
int txtIndex = 0;
while (txtIndex < txt.Length) {
  int startLineIndex = txtIndex;
GetMore:
  while (txtIndex < txt.Length && txt[txtIndex] != '\r'  && txt[txtIndex] != '\n')) {
    txtIndex++;
  }
  if (txtIndex < txt.Length && txt[txtIndex] == '\r' && (txtIndex == txt.Length-1 || txt[txtIndex+1] != '\n') {
    txtIndex++;
    goto GetMore; 
  }
  string line = txt.Substring(startLineIndex, txtIndex-startLineIndex);
  if (line.Contains(inputCompare_TB.Text)) {
    if (outputTxt.Length > 0)
      outputTxt.Append(Environment.NewLine);
    outputTxt.Append(line); 
  }
  txtIndex++;
} 
output_TB.Text = outputTxt.ToString(); 

Pre-emptive comment: someone will object to the goto - but it is what's needed here, the alternatives are much more complex (reg exp for example), or fake the goto with another loop and continue or break

Using a StringReader to split the lines is a much cleaner solution, but it does not handle both \r\n and \n as a new line:

StringReader reader = new StringReader(input_TB.Text); 
StringBuilder outputTxt = new StringBuilder();
string compareTxt = inputCompare_TB.Text;
string line; 
while((line = reader.ReadLine()) != null) { 
  if (line.Contains(compareTxt)) {
    if (outputTxt.Length > 0)
      outputTxt.Append(Environment.NewLine);
    outputTxt.Append(line); 
  }
} 
output_TB.Text = outputTxt.ToString(); 
MiMo
  • 11,793
  • 1
  • 33
  • 48
  • I didn't even know you could use a goto statement in c#, don't think I've used one since I was a kid playing around with pascal and basic, interesting. This seems overly complicated though, take a look at my update to my question. – Garrett Fogerlie Apr 30 '12 at 12:17
  • I added a note at the end of my answer - your update is cleaner but does not handle both `\r\n` and `\n` as end lines. If you can do away with that it is fine - I still suggest to use a `StringBuilder`, avoiding to create a (big?) intermediate list of strings. – MiMo Apr 30 '12 at 12:29
  • Yes, `goto` are possible in C#, and I use them - sparingly - as in this case. – MiMo Apr 30 '12 at 12:32
  • StringReader handles \r, \n or \r\n as a newline character. – Chris Dunaway Apr 30 '12 at 13:52
  • In my update code, does 'while((line = reader.ReadLine()) != null)' create a bunch of strings or does it just keep re-using the same one? – Garrett Fogerlie May 01 '12 at 11:56
  • It creates a new string each time, but the old ones won't be referenced anymore and so the garbage collector will re-use their memory – MiMo May 01 '12 at 13:18
0

I guess the only way to do this on large text files is to open the file manually and use a StreamReader. Here is an example how to do this.

Marek Dzikiewicz
  • 2,844
  • 1
  • 22
  • 24
0

You can avoid creating strings for all lines and the array by creating the string for each line one at a time:

var eol = new[] { '\r', '\n' };

var pos = 0;
while (pos < input.Length)
{
    var i = input.IndexOfAny(eol, pos);
    if (i < 0)
    {
        i = input.Length;
    }
    if (i != pos)
    {
        var line = input.Substring(pos, i - pos);

        // process line
    }
    pos = i + 1;
}
dtb
  • 213,145
  • 36
  • 401
  • 431
0

On other hand, In this article say that the point is that "split" method is implemented poorly. Read it, and make your conclusions.

Like Attila said, you have to parse line by line.

Community
  • 1
  • 1
Taber
  • 544
  • 3
  • 14