3

I use RichTextBox for testing REGEX expression, with the code like:

rtbMain.SelectAll();
rtbMain.SelectionColor = Color.Black;
rtbMain.SelectionBackColor = Color.White;
Regex regex = new Regex(txtRegexPattern.Text, regexOptions);
Match matches = regex.Match(txtTest.Text);
while (matches.Success)
{
    rtbMain.Select(matches.Index, match.Length);
    rtbMain.SelectionColor = Color.Red;
    rtbMain.SelectionBackColor = Color.Black;
}

But this method becomes too slow as soon as there are more than a few thousand (1000+) characters to be highlighted. I know I could delay processing, so that code gives user a chance to enter the whole Regular Expression, but still I think RichTextBox highlighting is working too slow.

I've searched the Google for different approaches and ways to speed up current solution, but I didn't have luck. I noticed that there are a few text editors which allow "syntax highlighting" (like ScintillNET, Avalon,...) but they use XML as input, so I think using them for my project (generating XML on every KeyUp event) wouldn't be the "best practice".

I have found and tested a "Fast colored Textbox" here: https://github.com/PavelTorgashov/FastColoredTextBox ...but the problem with this one is that it replaces the paste content while it uses its own new-line and tab character, and I cant use it in REGEX tester.

Is there any faster way to highlight all matches, maybe using a different user control?

EDIT:

APPROACH 1: Would generating the underlying RTF document be faster? I tried but had some problems with special characters, so I could test highlighting of the whole document, but it seemed to work quite fast with normal characters in a single line. I paused working on this since I read constructing RTF's can be quite hard, and I think I couldn't use none of the existing RTF libraries.

APPROACH 2: I am able to get only the displayed portion of RichTextBox, so I was thinking to only highlight that part. I guess this would significantly reduce processing (depends on RTB size), but I would need to trigger highlighting every time user scrolls; I'm not sure this would work well and create a decent user experience, so haven't tried it out yet.

Would anyone recommend any of the approaches above or maybe any others?

Tadej
  • 379
  • 4
  • 12
  • Why a while loop? since `matches` is not an array, so it should run once. – NeverHopeless May 30 '13 at 05:19
  • I guess I forgot to mention that only have a few months of experience in C#, so I'm not sure what the better approach would be foreach? I've actually also added a code which check match length, so I only highlight mathes with len>0. – Tadej Jun 04 '13 at 00:36
  • Have a look at my updated answer. – NeverHopeless Jun 13 '13 at 17:58
  • Thanks for the update, I only noticed it today. I'm going to test it out right now. However, in the mean time I almost gave up, since I got a feeling that I would need another user control (that supports faster highlighting and virtualization) instead of RichTextBox, but I couldn't find one that would do the job. – Tadej Jul 03 '13 at 15:20

3 Answers3

2

First:

The RichTextBox has an inherent problem: It is very slow in .NET. I found a solution how to make it 120 times faster. May be you try it out: C# RichEditBox has extremely slow performance (4 minutes loading) SOLVED

Second:

Building the RTF code from the scratch is far the fastest solution. Have a look a my article on codeproject. There is a RTF builder class that is reusable: http://www.codeproject.com/Articles/23513/SQL-Editor-for-Database-Developers

Community
  • 1
  • 1
Elmue
  • 7,602
  • 3
  • 47
  • 57
1

Please check Expresso at http://www.codeproject.com/Articles/3669/Expresso-A-Tool-for-Building-and-Testing-Regular-E

I have been using this program for editing and evaluating regex for years.

ZZZ
  • 2,752
  • 2
  • 25
  • 37
  • Thanks for the link; it could be one of the solutions, to highlight only 1 match at a time (similar to what they are doing), but still, I would prefer to highlight all if it would work faster. – Tadej May 30 '13 at 05:20
1

I have a doubt that you have setup your While loop in an incorrect manner.

Try something like this: (Untested, but will give you an idea how to troubleshoot this problem)

rtbMain.SelectAll();
rtbMain.SelectionColor = Color.Black;
rtbMain.SelectionBackColor = Color.White;
Regex regex = new Regex(txtRegexPattern.Text, regexOptions);
MatchCollection matches = regex.Matches(txtTest.Text);

if(matches.Count > 0)
{
   foreach(Match m in matches)
   {
      rtbMain.Select(m.Index, m.Length);
      rtbMain.SelectionColor = Color.Red;
      rtbMain.SelectionBackColor = Color.Black;
   }
}
else
{
   Debug.Print("No matches found"); // See "Output" Window
}

EDIT

I did some workaround related to highlight RTF text and first thing I found is the mostly time taken by the process was these lines:

  rtbMain.SelectionColor = Color.Red;
  rtbMain.SelectionBackColor = Color.Black;

I tried selecting the text using SelectionStart and SelectionEnd properties instead .Select(), but NO change has been observed.

Regarding your first point which is related to constructing equivalent RTF, I tried that too but it is difficult to construct an equivalent RTF since there are lot of stuff there which needs to be handle. If it can be done the process time will be around < 1.5 seconds for more than 31k matches (a result of basic test on a specific sample).

So, I would suggest you to do it via THREADING and split task in two threads:

Here is an example source code: (For worst case i found around 31341 matches and process took 4 seconds to highlight)

    // declare variables either globally or in the same method
    MatchCollection mcoll;
    Stopwatch s;
    int callbackCount = 0;
    List<Match> m1 = null;
    List<Match> m2 = null;

    private void btnHighlight_Click(object sender, EventArgs e)
    {
        //reset any exisiting formatting
        rtbMain.SelectAll();
        rtbMain.SelectionBackColor = Color.White;
        rtbMain.SelectionColor = Color.Black;
        rtbMain.DeselectAll();

        s = new Stopwatch();
        s.Start();

        Regex re = new Regex(@"(.)", RegexOptions.Compiled); // Notice COMPILED option
        mcoll = re.Matches(rtbMain.Text);

        // Break MatchCollection object into List<Matches> which is exactly half in size
        m1 = new List<Match>(mcoll.Count / 2);
        m2 = new List<Match>(mcoll.Count / 2);

        for (int k = 0; k < mcoll.Count; k++)
        {
            if (k < mcoll.Count / 2)
                m1.Add(mcoll[k]);
            else
                m2.Add(mcoll[k]);
        }

        Thread backgroundThread1 = new Thread(new ThreadStart(() => {
            match1(null, null);
        }));
        backgroundThread1.Start();

        Thread backgroundThread2 = new Thread(new ThreadStart(() =>
        {
            match2(null, null);
        }));
        backgroundThread2.Start();
    }

    public void match1(object obj, EventArgs e)
    {
        for (int i=0; i < m1.Count; i += 1)
        {
            if (rtbMain.InvokeRequired)
            {
                EventHandler d = new EventHandler(match1);
                rtbMain.Invoke(d);
            }
            else
            {
                rtbMain.Select(m1[i].Index, m1[i].Length);
                rtbMain.SelectionBackColor = Color.Black;
                rtbMain.SelectionColor = Color.Red;
            }
        }
        stopTimer();
    }

    public void match2(object obj, EventArgs e)
    {
        for (int j=0; j < m2.Count; j += 1)
        {
            if (rtbMain.InvokeRequired)
            {
                EventHandler d = new EventHandler(match2);
                rtbMain.Invoke(d);
            }
            else
            {
                rtbMain.Select(m2[j].Index, m2[j].Length);
                rtbMain.SelectionBackColor = Color.Black;
                rtbMain.SelectionColor = Color.Red;
            }
        }
        stopTimer();
    }

    void stopTimer()
    {
        callbackCount++;

        if (callbackCount == 2) // 2 because I am using two threads.
        {
            s.Stop();
            // Check Output Window
            Debug.Print("Evaluated in : " + s.Elapsed.Seconds.ToString());
        }
    }

Since as you posted it takes around 30 sec to manipulate, hope 4 sec is bearable and user can be engaged by some loading screen as the other online converters do like Rubular and DerekSlager's .Net regex tester does.

Don't forget to have a look at Why Regex.Compiled preferred.

Community
  • 1
  • 1
NeverHopeless
  • 11,077
  • 4
  • 35
  • 56
  • This part won't get accepted by VS: "if(matches.Count > 0)". With FOREACH loop I get an error that it can't operate on oon variables of type "...RegularExpression.Match". – Tadej Jun 06 '13 at 13:58
  • @Tadej, So do you think that if there is no match, RTB should highlight something ? – NeverHopeless Jun 06 '13 at 14:01
  • No, I don't think so, and I guess you are trying to say that we would go into "while (matches.Success)" even if no matches? I first tried to use "foreach" statement, and since it doesn't work I found a solution with "while" statement. Also, I've added code where I don't do highlighting if text length is < 1. – Tadej Jun 06 '13 at 14:05
  • I am scared...The same code is working for me... On which line this error happens ? Have you tried `Debug.Print(matches.Count.ToString());` just before if condition ? – NeverHopeless Jun 06 '13 at 14:16
  • Hmm.. I got it.. you should carefully look at this line `regex.MATCHES`: **MatchCollection** matches = regex.Matches(txtTest.Text); – NeverHopeless Jun 06 '13 at 15:10
  • variable `matches` is of type `MatchCollection` but i believe you have left it of type `Match`. Also use method `regex.Matches` instead of `regex.Match` – NeverHopeless Jun 06 '13 at 15:13
  • Sorry, I didn't notice that, but you were right...it works now. However, it doesn't seem its faster though. For example it takes 30 seconds to highlight each character of the HTML source code of this page. Because of that I think highlighting with .Select is the problem and is causing long process times. – Tadej Jun 06 '13 at 15:21
  • RE: EDIT: I've also gave up on RTF document generation, had too much problems, although I think I was headed the right way (I always used old table header to maintain styling and foreign characters), but I still had some problems with special characters (they were not rendered). I've tried the code that you provided, and I noticed you forgot to define integers i and j (I guess they should be 0), but even after adding that, I see one problem, highlighting doesn't work on the whole text (see old vs new method here http://screencast.com/t/bFRYf4yHjpb ). Btw, I also use compiled option for REGEX. – Tadej Jul 03 '13 at 15:24
  • Good to see you again, i have adjusted the conditions in the for loop, check it now. – NeverHopeless Jul 03 '13 at 17:24
  • Thanks for the update, it worked on short string now perfectlly. However, I've tried to test REGEX "\w" with the source HTML of this page, and text never gets highlighted and it looks like app is frozen. – Tadej Jul 04 '13 at 15:29