4

I had a simple RegEx pattern in a customer-facing payment form on our website:

<input type="text" pattern="(|\$)[0-9]*(|\.[0-9]{2})"
       title="Please enter a valid number in the amount field" required>

It was added to help quickly notify customers when they fail to enter a valid number, before hitting the server-side validation.

After four customers called in complaining that they were unable to submit the form because their browser continually told them the amount they had entered was incorrect, I did some digging and discovered that IE10+ doesn't like the back of that expression--any amount entered that did not include a decimal point was accepted, anything with a decimal was rejected. The pattern works in my development environment (Chrome 30+) and in Opera 12, but Firefox 27 won't validate it at all.

I read the specs, which just says:

If specified, the attribute's value must match the JavaScript Pattern production. [ECMA262]

And since the only browsers that support pattern are capable of supporting ECMAScript 5, I figure this includes the full support of all Javascript regular expressions.

Where can I learn more about the quirks between pattern support in the different browsers?

Eric L.
  • 3,232
  • 2
  • 22
  • 20
  • Because your question is "Where can I learn more about the quirks between pattern support in the different browsers?" I'm voting to close this as off-topic because it's asking to find a tool or reference. If you could update your question to ask about specific quirks instead I think it would be on topic. – zzzzBov Mar 12 '14 at 18:12
  • 1
    Javascript has his own regex engine and doesn't use the PCRE library. – Casimir et Hippolyte Mar 12 '14 at 18:28
  • @Casimir: Updated the wording for you. ;) – Eric L. Mar 12 '14 at 18:29
  • @zzzzBov: The reason I reached out here was my complete inability to dig anything up via Google. And, seeing SO's increasing role as a summary/archive of programming information in the modern internet information-sphere, I thought it appropriate to ask. – Eric L. Mar 12 '14 at 18:32
  • 1
    @Eirik, while it's appropriate to ask "a specific, answerable question", it's off topic to ask for off-site references, because it encourages link-only answers where it could be difficult to pick one as being "correct", and would be vulnerable to link-rot. – zzzzBov Mar 12 '14 at 18:34
  • http://jsfiddle.net/q2DgK/ oh IE10 actually fails to match decimals. – Fabrício Matté Mar 12 '14 at 18:39
  • `(\.[0-9]{2}|)` and `(\.[0-9]{2})?` works in IE. – Fabrício Matté Mar 12 '14 at 18:40
  • @FabrícioMatté: For this particular problem, I refactored and opened it up to `\D*[0-9]*(\.[0-9]{2})?[^0-9\.]*`, which works fine in IE, but I'd like to learn what's going wrong with that particular expression in IE, and what I can do about situations like that in the future. – Eric L. Mar 12 '14 at 18:44
  • Your validating customer input in the browser for sales, what about those people building bots etc, who would just rewrite your code to their purpose and send invalidated data to your order processing center – alexmac Mar 15 '14 at 21:41
  • `It was added to help quickly notify customers` ... `before hitting the server-side validation.` – Eric L. Mar 17 '14 at 11:21

1 Answers1

3

The problem seems to an IE-only bug. Your link to the spec is pretty dead on, heres the bit IE is missing:

... except that the pattern attribute is matched against the entire value, not just any subset (somewhat as if it implied a ^(?: at the start of the pattern and a )$ at the end)

You can actually fix this bug by doing just that to your own pattern - namely:

^(?:(|\$)[0-9]*(|\.[0-9]{2}))$

This is working for me in IE9 and IE10, as well as Chrome. See updated fiddle

The technical reason this happens is a bit more complex:

If you read the EMCA 5.1 spec, in section 15.10.2.3, it talks about how alternations should be evaluated. Basically, each 'part' of the | is evaluated left to right, until one is found that matches. That value is assumed unless there is a problem in the 'sequel', in which case the other possibilities in the alternation are evaluated.

What it seems IE is doing is matching the beginning of your string using the empty parts of your alternations, and it works: \$[digits][empty] matches the start of $12.12 up to the decimal point. IE's regex engine (correctly) says that this is a match, because a substring matched, and it's not been told to check to the end of the string.

Once the regex engine (without the anchors to force the whole string to match) returns true, that there was a match, some engineer at Microsoft took a shortcut and told the pattern attribute to also check that the matched part equals the whole string, and there's where the failure comes from. The engine only matched part of the string, even though it could have matched more, so the secondary check fails, thinking there is extraneous input at the end.

This case is subtle, so I'm not too surprised it hasn't been caught before. I have created a bug report https://connect.microsoft.com/IE/feedback/details/836117/regex-bug-in-pattern-validator to see if there is a response from Microsoft.

The reason this relates to the EMCA spec is that if the engine was told to match the whole string, it would have backtracked when it hit the decimal and tried to match the 2nd part of the alternation, found and matched (\.[0-9{2}), and the whole thing would have worked.


Now, for some workarounds:

  • Add the anchors ^(?: and )$ to your patterns

  • Don't use empty alternations. Personally, I like using the optional $ instead for these cases. Your pattern becomes (\$?)[0-9]*(\.[0-9]{2})? and will work because ? is a greedy match, and the engine will consume the whole string if possible, rather than alternation, which is first match

  • Swap the order on your alternations. If the longer string is tested first, it will match first, and be used first. This has come up in other languages - Why order matters in this RegEx with alternation?

PS: Be careful with the * for your digits. Right now, "$" is a valid match because * allows for 0 digits. My recommendation for your full regex would be (\$)?(\d+)(\.\d{2})?

Community
  • 1
  • 1
dtyler
  • 1,398
  • 2
  • 15
  • 21
  • 1
    Wow. 1: I always forget to use character classes when writing JS REs, for some reason. 2: Since no one's responded to this question, and I still can't find any information about similar issues, I agree this must be an IE bug. I'm accepting your response 'cause it seems logical that I'm crazy and all browsers *should* have regular expression engines that work the same. Thanks for knowing how to create a Microsoft bug and for opening an issue with them. 3: Your last pattern is beautiful. Good catch on the `*` leading to funny matches. – Eric L. Mar 18 '14 at 19:51