14

I'm working on a special regex to match a javascript regex.

For now I have this regex working:

/\/(.*)?\/([i|g|m]+)?/

For example:

'/^foo/'.match(/\/(.*)?\/([i|g|m]+)?/) => ["/^foo/", "^foo", undefined]
'/^foo/i'.match(/\/(.*)?\/([i|g|m]+)?/) => ["/^foo/i", "^foo", "i"]

Now I need to get this regex working with:

'^foo'.match(/\/(.*)?\/([i|g|m]+)?/) => ["^foo", "^foo", undefined]

Unfortunately my previous regex doesn't work for that one.

Can someone help me to find a regex matching this example (and others too):

'^foo'.match([a regex]) => ["^foo", "^foo", undefined]
Mike Samuel
  • 118,113
  • 30
  • 216
  • 245
julesbou
  • 5,570
  • 4
  • 31
  • 36
  • 3
    Your first regex also matches `/foo/|||`, FYI. – Sean Jul 24 '13 at 20:18
  • I literally just asked the question with a new twist. Take a look at [slak's comment](http://stackoverflow.com/questions/37535865/regex-describing-a-regex-pattern#comment62561839_37535865)... –  May 31 '16 at 02:28

1 Answers1

24

A regular expression to match a regular expression is

/\/((?![*+?])(?:[^\r\n\[/\\]|\\.|\[(?:[^\r\n\]\\]|\\.)*\])+)\/((?:g(?:im?|mi?)?|i(?:gm?|mg?)?|m(?:gi?|ig?)?)?)/

To break it down,

  1. \/ matches a literal /
  2. (?![*+?]) is necessary because /* starts a comment, not a regular expression.
  3. [^\r\n\[/\\] matches any non-escape sequence character and non-start of character group
  4. \[...\] matches a character group which can contain an un-escaped /.
  5. \\. matches a prefix of an escape sequence
  6. + is necessary because // is a line comment, not a regular expression.
  7. (?:g...)? matches any combination of non-repeating regular expression flags. So ugly.

This doesn't attempt to pair parentheses, or check that repetition modifiers are not applied to themselves, but filters out most of the other ways that regular expressions fail to syntax check.

If you need one that matches just the body, just strip off everything else:

/(?![*+?])(?:[^\r\n\[/\\]|\\.|\[(?:[^\r\n\]\\]|\\.)*\])+/

or alternatively, add "/" at the beginning and end of your input.

Niklas
  • 375
  • 1
  • 3
  • 17
Mike Samuel
  • 118,113
  • 30
  • 216
  • 245
  • 39
    I don't know by looking whether this works, but I do know that it frightens me. – Mark Amery Jul 24 '13 at 20:02
  • 2
    `'/foo/i'.match(/\/(?![*+?])(?:[^\r\n\[/\\]|\\.|\[(?:[^\r\n\]\\]|\\.)*\])+\/(?:g(?:im?|m)?|i(?:gm?|m)?|m(?:gi?|i)?)?/)` result is `['/foo/i']`. That's not expected result. – julesbou Jul 24 '13 at 20:22
  • @jules, I added parentheses in the appropriate places so that the match has the bits you expect. – Mike Samuel Jul 24 '13 at 21:53
  • @MikeSamuel All the purpose of my question was to match `'^foo'.match([a regex]) => ["^foo", "^foo", undefined]`. I know this is not a regex (cuz' there's no slash), but I want to be smart and match this one too. Do you have an idea? I appreciate your help. – julesbou Jul 25 '13 at 08:36
  • @jules, The second regex should match just the pattern portion without the surrounding `/`s or the flags. – Mike Samuel Jul 25 '13 at 13:01
  • 4
    "x = 2; y = x / 3; z = x/4". In this case your regex would not work – Gor Jun 28 '17 at 11:53
  • @Gor, this pattern is not meant to lex JS. It would also spuriously match in `"/./"`. JS doesn't have a regular lexical grammar, so this is not a problem with this regex. No regex will correctly split an arbitrary JS program into tokens. – Mike Samuel Jun 28 '17 at 13:14
  • You could change the flag matching to something like this: `(?:i(?!.*i)|g(?!.*g)|m(?!.*m))*` or if there are matching groups `((?:([igm])(?!.*\3))*)`, group 3 would the last flag. You may have to affix `(?=\s|$)` as well then though. The nice part is it's then really simple to add flags. e.g. `(?:i(?!.*i)|g(?!.*g)|m(?!.*m)|u(?!.*u)|y(?!.*y))*` – scagood Mar 23 '18 at 17:38
  • 1
    Oh, on second thoughts, if people want regex that matches with a string (or something) after it this: `(?:i(?!\w*i)|g(?!\w*g)|m(?!\w*m))*(?!\w)` may be more appropriate. (or `((?:([igm])(?!\w*\3))*)(?!\w)`) – scagood Mar 23 '18 at 17:56