Performance of regex within jQuery data selector: dependance on certain string length

Question

The setup: I have a div with a bunch of radio buttons, each of which has been associated with a custom attribute and value using $(element).data(attr_name,attr_value);. When an underlying data structure is changed, I iterate over the fields and set the appropriate buttons to checked:true by using the ':data' selector found here: https://stackoverflow.com/a/2895933/1214731

$($('#style-options').find(':radio').filter(':data('+key+'=='+value+')'))
                                    .prop('checked',true).button('refresh');

This works great: it finds the appropriate elements, even with floating-point values.

Performance depends on value:
I noticed that when I clicked on certain buttons, the page took fractionally longer to respond (for most buttons there was no noticeable delay). Digging a little deeper, this seems to be occurring when certain floating point values are being searched for.

Using chrome dev tools, I logged the following:

> key='fill-opacity';  
"fill-opacity"
> value=.2*2;
0.4  
> console.time('find data'); for(var i=0;i<100;++i){$('#style-options').find(':radio').filter(':data('+key+'=='+value+')')} console.timeEnd('find data');
find data: 43.352ms undefined

> value=.2*3;
0.6000000000000001
> console.time('find data'); for(var i=0;i<100;++i){$('#style-options').find(':radio').filter(':data('+key+'=='+value+')')} console.timeEnd('find data');
find data: 10322.866ms undefined

The difference in speed is a factor of >200!

Next, I tried typing the number in manually (e.g. decimal place, six, 14x zeros, one) - same speed. All numbers with the same number of digits were the same speed. I then reduced the number of digits progressively:

# of digits    time (ms)
         16    10300 
         15    5185
         14    2665
         13    1314
         12    673
         11    359
         10    202
          9    116
          8    77
          7    60
          6    50
          5    41
          4    39

I quickly ruled out the equality check between numeric and string - no dependence on string length there.

The regex execution is strongly dependent on string length
In the linked answer above, the regex that parses the data string is this:

var matcher = /\s*(?:((?:(?:\\\.|[^.,])+\.?)+)\s*([!~><=]=|[><])\s*("|')?((?:\\\3|.)*?)\3|(.+?))\s*(?:,|$)/g;

The string passed in is of the form [name operator value]. The length of name doesn't seem to make much difference; the length of value has a big impact on speed however.

Specific questions:
1) Why does the length of name have minimal effect on performance, while the length of value has a large effect?
2) Doubling the execution time with each additional character in name seems excessive - is this just a characteristic of the particular regex the linked solution uses, or is it a more general feature?
3) How can I improve performance without sacrificing a lot of flexibility? I'd like to still be able to pass arguments as a single string to a jQuery selector so type checking up front seems difficult, though I'm open to suggestions.

Basic test code for regex matching speeds:

matcher = /\s*(?:((?:(?:\\\.|[^.,])+\.?)+)\s*([!~><=]=|[><])\s*("|')?((?:\\\3|.)*?)\3|(.+?))\s*(?:,|$)/g;

console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('x=='+.1111111111111)}; console.timeEnd('regex')
regex: 538.018ms

//add an extra digit - doubles duration of test
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('x=='+.11111111111111)}; console.timeEnd('regex')
regex: 1078.742ms

//add a bunch to the length of 'name' - minimal effect
console.time('regex'); for(var i=0;i<1000;++i){matcher.lastIndex=0; matcher.exec('xxxxxxxxxxxxxxxxxxxx=='+.11111111111111)}; console.timeEnd('regex')
regex: 1084.367ms

You have brought a lot of information and yet, i don't know precisly what you are asking for. Do you want an explaination why it is slow? Maybe you want a faster code. Anyway, here's a good lecture once you have a minute : http://swtch.com/~rsc/regexp/regexp1.html. — Karl-André Gagnon, Feb 05 '14 at 00:48
Yes, I agree the regex is slow. What is an example text of what your regex is trying to match. With an example it will be much easier to refine your regex. — Bryan Elliott, Feb 05 '14 at 00:56
Seems like quite a complicated regex if you're just trying to match something like, `xx==.1111111`. What about simply `(\w+)==([\d.]+)`? — Bryan Elliott, Feb 05 '14 at 01:12
@Karl-AndréGagnon Updated with specific questions. I'm taking a look at your link, thanks... — tmpearce, Feb 05 '14 at 01:22
@MElliott I'd like to maintain flexibility, as not all operators will be `==` and not all values will be `numeric`. I'm open to suggestions about more efficient implementations though, and if it is possible to somehow route calls to more specific implementations that'd be cool. — tmpearce, Feb 05 '14 at 01:25
given the RegExp mess needed to support jq's magic, i would bet that using $(document.querySelectorAll('#style-options input[type="radio"][data-fill-opacity="0.6000000000000001"]')) would perform a lot faster than using .find() and several non-standard selectors. — dandavis, Feb 05 '14 at 01:41
@dandavis I'm not having luck with your suggestion, it keeps returning `[]` (even using simple names and values rather than dashes/floats etc.). Do you happen to have a working example? — tmpearce, Feb 05 '14 at 01:50
i was hoping you had a working example. if you make one, i'll mangle together something that works but it's hard for me to debug against html i can't see. — dandavis, Feb 05 '14 at 01:51
@dandavis Here's an example of the selector I'm using now working, and the one you've proposed not working. http://jsfiddle.net/Z6RDs/ — tmpearce, Feb 05 '14 at 02:04
hmm. i seem to recall $.data using the data- attribs, but that seems to not be the case anymore and i can't find documentation for ":data". If you could use .attr("data-test", n) instead of data() mine works great. — dandavis, Feb 05 '14 at 02:38

neves · Answer 1 · 2014-05-21T00:30:35.190

A characteristic of regexp matching is that they are greedy. If you try to match the expression a.*b to the string abcd, it will happen in these steps:

the first "a" will match
the .* will match the second char, then the third, till the end of the string
reaching the end of the string, there is still a "b" to matched, the matching will fail
the regexp processing starts to backtrack
the last char will be "unmatched" and it will try to match "b" to "d". Fails again. More backtracking
tries to match "b" to "c". Fail. Backtrack.
match "b" to "b". Success. Matching ends.

Although you matched just a few chars, you iterated all the string. If you have more than one greedy operator, you can easily get an input string that will match with exponential complexity.

Understanding backtracking will prevent a lot of errors and performance problems. For example, 'a.*b' will match all the string 'abbbbbbb', instead of just the first 'ab'.

The easiest way to prevent these kind of errors in modern regexp engines, is to use the non-greedy version of the operators * and +. They are usually represented by the same operators followed by a question mark: *? and +?.

I confess that I really didn't stop to debug the complicate regexp that you posted, but I believe that the problem is before matching the '=' symbol. The greedy operator is in this subexpression:

(?:\\\.|[^.,])+\.?)+

I'd try to change it to a non-greedy version:

(?:\\\.|[^.,])+\.?)+?

but this is really just a wild guess. I'm using pattern recognition to solve the problem :-) It makes sense because it is backtracking for each character of the "value" till matching the operator. The name is matched linearly.

This regular expression is just too complex for my taste. I love regular expressions, but it looks like this one matches this famous quotation:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Interesting. I ran my speed test code and your change did significantly speed things up. I don't have a test suite in place to make sure that the results are still valid though. For example, if I want to match `1.23`, I don't want it to return a match for `1.2345` - that would be invalid. (By the way, I love your quote at the end. I long-since abandoned this regex approach in order to return to my one original problem :) Since this regex isn't one I wrote I also didn't want to dig too far into it.) — tmpearce, May 20 '14 at 14:08
I believe it will work, since it will always match till ([!~><=]=|[><]). The greedy version always go till the end of the string and backtracks till the operator. Regexps should be enjoyed in moderation. — neves, May 21 '14 at 00:17

Performance of regex within jQuery data selector: dependance on certain string length

1 Answers1

Linked