0

I have a bit of code that works great on smaller files, but when the files are bigger the program locks up - or is just so slow it appears to be - I can walk away for 10 minutes and it's still sitting there. How do I improve the efficiency of this code for larger files? Also, something minor - when it's at the last split, the next item has nothing to split and I end up with a duplicate replace. How do I fix this? The efficiency issue is obviously my main problem here.

for (int i = 0; i < divs.Count; i++)
{
    Regex regex = new Regex("</div>");
    string[] hands = regex.Split(divs[i].ToString());

    string output = string.Empty;
    foreach (var item in hands)
    {

        output += item + "</div>";
        string text = File.ReadAllText(strfilename);
        text = text.Replace("style = \"#\" >", textBox1.Text);
        ////style = "#" > 
        richTextBox1.Text = text;
    }


    //supposed to output the array to a message box
    MessageBox.Show(output);
}
Konrad Kokosa
  • 16,563
  • 2
  • 36
  • 58
CandyCane
  • 29
  • 5
  • For me, it is not clear what are you trying to do. How does `text` relate to `output` and `item`? – Konrad Kokosa Dec 14 '13 at 02:29
  • Mandatory link for "how to parse HTML with regex": http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags . Please use HtmlAgilityPack (or any other HTML parsing library) if indeed you are parsing HTML. – Alexei Levenkov Dec 14 '13 at 02:32

4 Answers4

1

It doesn't look like you need a regex, try String.Split

It also looks like you are parsing HTML with RegEx, consider using a HTML parser.

If the files are large avoid ReadAllText as this will load the entire file into memory consider StreamReader - but an HTML parser would be better.

And do you really need to update the richTextBox1.Text property each time around the loop?

You are reading the entire file each time around the loop? Why?

Move everything that doesn't absolutely have to happen inside the loops outside (before or after).

Dean Taylor
  • 40,514
  • 3
  • 31
  • 50
0

The only obvious improvement would be to use String.Split in favor of RegEx. It is sufficient here and performs much better. So the first change I would make would be to change;

  Regex regex = new Regex("</div>");
  string[] hands = regex.Split(divs[i].ToString());

to

  string[] hands = divs[i].Split(new string[] { "</div>" }, StringSplitOptions.None);

As pointed out in the other answer File.ReadAllText has some limitations that the StreamReader approach does not. However, you'll only run into them if your files are extremely large or the system the software is running on is lacking in RAM. In the main code base I currently work on File.ReadAllText and File.ReadAllLines are almost always the method used to read files.

evanmcdonnal
  • 46,131
  • 16
  • 104
  • 115
  • between what everyone said, my code is now a bit more optimized and works on larger files now. thank you. – CandyCane Dec 14 '13 at 02:45
0

See what effect each of these has on performance:

  1. Move the 'File.ReadAllText out of that loop. It gets the same text every time.

  2. Move the 'regex = new Regex' outside of the loop, and use the 'compiled' overload.

  3. use a stringbuilder instead of string concatenation.

  4. use the stopwatch classes to get timing for parts of the code to see where the time is spent.

  5. Watch out for Cthulu.

Leon Bambrick
  • 26,009
  • 9
  • 51
  • 75
  • 1
    thanks, i kept hearing strange noises outside and couldn't figure out what that was. as far as optimization, i will do what you say. – CandyCane Dec 14 '13 at 02:40
0

You're using a for loop in the other and do Read a file in the loop. It's not a good idea for your situation. You can use stack to recognize when is appear close tag "".

Ringo
  • 3,795
  • 3
  • 22
  • 37