13

I have an app running which looks at items in a queue, then based upon certain keywords a category is applied - then it is inserted into a database.

I'm using IndexOf to determine if a certain keyword is present.

Is this the ideal way or would a RegEX be faster?

There's about 10 items per second being processed or so.

Jack Marchetti
  • 15,536
  • 14
  • 81
  • 117
  • 6
    You should try both approaches and measure what is faster. Also, 10 times per second is nothing, you shouldn't worry about performances here. – ken2k Feb 21 '12 at 15:22
  • 2
    Also, we'd need to know more about the relative complexity of the parsing. If you need to call String.IndexOf 10 times to achieve the same effect as the RegEx, the performance ratio will be different than if it is 1 for 1. – Chris Shain Feb 21 '12 at 15:24
  • 1
    10 items per second is nothing? When would you actually start to care about performance then? – Jack Marchetti Feb 21 '12 at 15:32

9 Answers9

19

For just finding a keyword the IndexOf method is faster than using a regular expression. Regular expressions are powerful, but their power lies in flexibility, not raw speed. They don't beat string methods at simple string operations.

Anyway, if the strings are not huge, it shouldn't really matter as you are not doing it so often.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
14

http://ayende.com/blog/2930/regex-vs-string-indexof

It seems it may matter on the length of the string on efficiency.

David Welker
  • 368
  • 2
  • 13
3

The only way you know for sure is testing it. But making an educated guess it depends on the number of keywords your are testing, the length of the text, etc. The indexOf would probably win.

The only way you know for sure is write a test for your specific scenario.

Peter
  • 27,590
  • 8
  • 64
  • 84
2

I doubt it - indexOf is a very simple algorithm that will just seek through your string and return the first occurrence it finds.

Regex is a far more complex mechanism that needs to be parsed and checked against the whole string. If your string is very large, you are better off with indexOf.

F.P
  • 17,421
  • 34
  • 123
  • 189
1

It seems correct that regex is faster in longer strings. My example: a 364kB file content is searched for the string "<product ". The starting point is moved to find the next and the next and so on. However, the searched string is not found in the entire value.

I used three test commands:

         i = value.IndexOf("<" & tag & " ", xstart)

         i = value.IndexOf("<" & tag & " ", xstart, StringComparison.Ordinal)

         i = Regex.IsMatch(value.Substring(xstart), "<" & tag & " ", RegexOptions.Singleline)

Command one (indexof standard) needs ~ 7500 ms to search the string Command two (indexof with ordinal) needs ~ 300 ms ! command three (regex) needs ~ 650 ms (~1000ms with IgnoreCase option).

Herbert
  • 151
  • 2
  • 12
1

First of all, with 10 items per second you probably don't even need to think about performance.

IndexOf is probably faster than regex in most cases. Especially if you don't use a precompiled regex.

It's performance might also depend on the chosen string comparison/culture. I expect StringComparison.Ordinal to be fastest.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
1

Why not experiment and measure the time elapsed using the System.Diagnostics.Stopwatch class? http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.aspx

Set up a Stopwatch object before your indexOf operation and then measure elapsed time after it. Then, swap out the indexOf for a regular expression. Finally, report back with your findings so that we can see them too!

Adil B
  • 14,635
  • 11
  • 60
  • 78
1

At least this programmer finds it faster to understand the code that uses IndexOf!

Does saving a little CPU time justify putting up the time it takes the next person to understand the code?

Ian Ringrose
  • 51,220
  • 55
  • 213
  • 317
  • 1
    A regex that would find the first occurrence of a string to emulate `indexOf` wouldn't put any programmer into serious trouble if he wanted to understand it. – F.P Feb 21 '12 at 15:29
  • @FlorianPeschka, agreed the cost is low, but there is still a cost of looking at the RegEx. – Ian Ringrose Feb 21 '12 at 15:32
  • 1
    RegEx.Match is hard to understand? – Jack Marchetti Feb 21 '12 at 15:38
  • 1
    If RegEx is hard to understand well, then developers need to learn a little. It's like a mechanic would say that hex keys are hard to use so they use something else instead. **Learn the tools of your profession.** There's no excuse for that. – Robert Koritnik Jul 23 '15 at 08:07
0

You can find information about this very query on this link: http://ayende.com/blog/2930/regex-vs-string-indexof

In summary it seems to indicate that the larger the searchpattern the better RegEx performs comparatively.

Erik Nordenhök
  • 645
  • 1
  • 5
  • 14