1

I've got an app with a textbox in it. The user enters text into this box.

I have the following function triggered in an OnKeyUp event in that textbox

private void bxItemText_KeyUp(object sender, System.Windows.Input.KeyEventArgs e)
{               
    // rules is an array of regex strings
    foreach (string rule in rules)
    {
        Regex regex = new Regex(rule);
        if (regex.Match(text.Trim().ToLower()))
        {
            // matched rule is a variable
            matchedRule = rule;
            break;
        }
    }
}

I've got about 12 strings in rules, although this can vary.

Once the length of the text in my textbox gets bigger than 80 characters performance starts to degrade. typing a letter after 100 characters takes a second to show up.

How do I optimise this? Should I match on every 3rd KeyUp ? Should I abandon KeyUp altogether and just auto match every couple of seconds?

roryok
  • 9,325
  • 17
  • 71
  • 138
  • When do the `rules` change? Can they be merged into a single Regex? If not you can just start a Timer on KeyUp instead, wait 300ms and *then* do your Regex test. That'll prevent the user from doing it too quickly. What happens when a rule is matched? Can this event be deferred? – CodingIntrigue Oct 15 '13 at 11:23
  • 1
    In addition to compiling the expressions (rules) only once it would help if you used a single expression instead of ~12. If you aren't replacing then it's trivial to combine them. Also you can try to eliminate backtracking where possible by using `(?>...)`. – Qtax Oct 15 '13 at 11:26
  • regular expression works on backtracking so bigger your input bigger is latency, you should avoid checking such large strings otherwise it will effect your code to stop responding. – A.T. Oct 15 '13 at 11:27
  • Do you have to use regex? Depending on what sort of match you're looking at, your own simple scanner might be quicker – Rob Oct 15 '13 at 11:34
  • Rob: can you elaborate on that? I'm actually only matching large arrays of keywords at the moment, I just assumed regex was the fastest way to do this. – roryok Oct 15 '13 at 11:37
  • roryok: Each regex is something like `key1|mykey2|word3`? – Rob Oct 15 '13 at 11:38
  • rob: pretty much. Some expressions stipulate that this word must be at the beginning or the end of the line, or must be preceded by a space, but nothing more complex than that – roryok Oct 15 '13 at 11:40
  • RGraham: The rules can't be combined but a timer might not be a bad idea. I'm going to try a combined approach. Compiled regex, removing the need for `Trim().ToLower()` and using a timer – roryok Oct 15 '13 at 11:44

7 Answers7

4

How do I optimise this? Should I match on every 3rd KeyUp ? Should I abandon KeyUp altogether and just auto match every couple of seconds?

I would go with the second option, that is abandon KeyUp and trigger the validation every couple of seconds, or better yet trigger the validation when the TextBox loses focus.

On the other hand, I should suggest to cache the regular expressions beforehand and compile them because it seems like you are using them over and over again, in other words instead of storing the rules as strings in that array, you should store them as compiled regular expression objects when they are added or loaded.

Ibrahim Najjar
  • 19,178
  • 4
  • 69
  • 95
1

Use static method calls instead of create a new object each time, static calls use a caching feature : Optimizing Regular Expression Performance, Part I: Working with the Regex Class and Regex Objects.

That will be a major improvement in performance, then you can provide your regexes (rules) to see if some optimization can be done in the regexes.

Other resources :

polkduran
  • 2,533
  • 24
  • 34
  • At the bottom of the page you link to, it states that best practice is to _Use compiled regular expressions to optimize performance for regular expression patterns that are known in advance and when the frequency of calls to the regular expression’s pattern-matching methods can vary extensively_. Does this not match OP's situation? – Rob Oct 15 '13 at 12:58
  • Are you talking about compiled regexes using the `RegexOptions.Compiled` flag or compiling regexes to another assembly using `Regex.CompileToAssembly` method? The first one can be a better option than the second one, in general, compile regexes to a separated assembly is more suitable for libraries because of the work to maintain them. – polkduran Oct 15 '13 at 13:04
  • I meant use `RegexOptions.Compiled`, both in my comment here, and in my answer above. – Rob Oct 15 '13 at 13:11
  • Ok, I understand, we were not talking about the same thing. Indeed, if he uses the same regex very frequently, it would be ok to use the `RegexOptions.Compiled` option and keep a reference of an instance. The gain between keeping a reference of an instance and a static call is the cache lookup performed by the static call. – polkduran Oct 15 '13 at 13:28
0

Combining strings to one on Regex level will work faster than foreach in code. Combining two Regex to one

Community
  • 1
  • 1
Nikita
  • 422
  • 5
  • 14
  • They can't be combined. I need to know which rule matched. If they're all the same rule I can't tell them apart – roryok Oct 15 '13 at 11:43
0

If you need pattern determination for Each new symbol, and you care about performance, than Final State Machine seems to be the best option... That is much harder way. You should specify for each symbol list of next symbols, that are allowed. And OnKeyUp you just walk on next state, if possible. And you will have the amount of patterns, that input text currently matches. Some useful references, that I could found:

FSM example

Guy explaining how to convert Regex to FSM

Regex - FSM converting discussion

Community
  • 1
  • 1
Nikita
  • 422
  • 5
  • 14
0

You don't need to create a new regex object each time. Also using static call will cache the pattern if used before (since .Net 2). Here is how I would rewrite it

matchedRule = Rules.FirstOrDefault( rule => Regex.IsMatch(text.Trim().ToLower(), rule));

ΩmegaMan
  • 29,542
  • 12
  • 100
  • 122
0

Given that you seem to be matching keywords, can you just perform the match on the current portion of text that's been edited (i.e. in the vicinity of the cursor)? Might be tricky to engineer, especially for operations like paste or undo, but scope for a big performance gain.

Rob
  • 4,327
  • 6
  • 29
  • 55
-1

Pre-compile your regexes (using RegexOptions.Compiled). Also, can you make the Trim and ToLower redundant by expanding your regex? You're running Trim and ToLower once for each rule, which is inefficient even if you can't eliminate them altogether

You can try and make your rules mutually exclusive - this should speed things up. I did a short test: matching against the following

"cat|car|cab|box|balloon|button"

can be sped up by writing it like this

"ca(t|r|b)|b(ox|alloon|utton)"

Rob
  • 4,327
  • 6
  • 29
  • 55
  • In this case precompilation will not be very useful, it can even make it worse because he is using instance methods and creating a new object each time. If you are talking about compiling in design time each rule, it will help but extra work will be needed to choose the right class for each rule at running time. Static calls will be a better option in order to use `caching` : http://blogs.msdn.com/b/bclteam/archive/2010/06/25/optimizing-regular-expression-performance-part-i-working-with-the-regex-class-and-regex-objects.aspx – polkduran Oct 15 '13 at 12:41
  • @polkduran: Huh? OP has a collection of rules somewhere, which are currently represented as strings. If he replaces this with a collection of precompiled regular expressions, he will get a significant performance gain (50% in my admittedly trivial test). What am I missing? – Rob Oct 15 '13 at 12:46
  • I think the extra work of compiling a list of regexes to another assembly at design time, adding the assembly to the application, mapping rules to instance classes at run time and maintaining all this each time a new rule needs to be added will be a lot of work for the gain in performance. Static calls keep a cache of compiled regexes (15 by default, can be modified changing the value of `Regex.CacheSize`), it is true that if the regex is not in the cache it will be compiled at run time, but after the fist call the NFA is already built. – polkduran Oct 15 '13 at 12:56