1

I'm refactoring a rather large RegExp into a function that returns a RegExp. As a backward-compatibility test, I compared the .source of the returned RegExp with the .source of the old RegExp:

getRegExp(/* in the case requiring backward compatibility there's no arguments */)
    .source == oldRegExp.source

However, I've noticed that the old RegExp contains various excessive backslashes like [\.\w] instead of [.\w]. I'd like to refactor such bits, but there's a number of them and it would be nice to have a similar check (backward compability is not broken). The problem is, /[\.\w]/.source != /[.\w]/.source. And identifying which backslashes may be removed automatically is not trivial (\. and . are not the same outside [...] and may be in some other cases).

Are you aware of somewhat simple ways to do so? It seems this can only be done by actual parsing of the .source (compare the example above with /\[\.\w]\/ and /\[.\w]\/), but may be I'm missing some trick of utilizing browser's built-in properties/methods. The point is, '\"' == '"' is true, so strings defined with these different syntaxes are stored as "normalized" values ("), I wonder if such "normalized" pattern is available for a RegExp.

YakovL
  • 7,557
  • 12
  • 62
  • 102
  • 1
    @ggorlen could you clarify how `[.\w]` is different from `[\.\w]`? See https://stackoverflow.com/a/19976308/3995261 (https://www.regular-expressions.info/refcharclass.html). Yes, the backslash is excessive – YakovL Sep 14 '19 at 09:08
  • Oh thanks, I didn't realize that. Here's [another thread](https://stackoverflow.com/questions/489095/how-to-determine-if-a-regex-is-orthogonal-to-another-regex) that has useful resources and is probably a better dupe than the above link. As Chris mentions below, I think proving two regexes are equivalent probably reduces to the [halting problem](https://en.wikipedia.org/wiki/Halting_problem). – ggorlen Sep 14 '19 at 15:41
  • @ggorlen well, not exactly. Consider `'\"' == '"'` is true: these are the same string values (`"`) defined with different syntax. I wonder if RegExp have a similar "normalized" representation which is available natively and could be compared (pity that `.source` is not). – YakovL Sep 14 '19 at 17:06

1 Answers1

2

Sadly, comparing two regular expressions to see if they're the same is exactly the same as comparing any other two pieces of code - ie, hard.

The only real way I know of to do this is to create a suite of tests, each one targeting a specific aspect of the regular expression and verifying that it works properly. This is not an easy process-regular expressions are subtle and complex with a lot of potential for unrealized side effects. I recently had to fix some defects in a regex based address parser and it took about a thousand unit tests before I was satisfied with my coverage... but then as soon as I started to change the regex MY TESTS CAUGHT STUFF CONSTANTLY!!

Unit testing sucks and it's just tiring and not fun, but for almost any piece of logic it has real value, and when using powerful tools like regex, I would say it's absolutely crucial.

Chris Belyeu
  • 208
  • 1
  • 9
  • yeah, I'm thinking about tests too (actually, I already have a number of them), it's just, like you mentioned, lack of confidence because of "unknown" coverage – YakovL Sep 14 '19 at 09:11