do we ever use regex to find regex expressions?

Question

let's say i have a very long string. the string has regular expressions at random locations. can i use regex to find the regex's?

What's the difference between a non-regex portion of the string and a regex portion? — jball, Nov 10 '10 at 23:33
Sure, but if you're going to encompass the whole regex syntax, you just built yourself a pre-parser. I'd suggest providing more of a spec to question. — Jason McCreary, Nov 10 '10 at 23:34
@i am a girl - so if you can't describe the difference, how will a DFA know what to accept or reject? — jball, Nov 10 '10 at 23:35
@i am a girl. What's the regular expression in the following string: `/[a-z]+\/[0-9]+/`? Is it the whole string? Or is it `/[0-9]+/`? You need to define what you mean by a regular expression. If you limit it to *only* strings that are surrounded by `/`, it is easier. — Vivin Paliath, Nov 10 '10 at 23:45
@i am girl. The problem is, the whole long string is itself a valid regular expression. The general solution is therefore this: `function find_regex (str) {return str}`. That's because all strings are valid regular expressions. — slebetman, Nov 10 '10 at 23:53
i understand what you guys are saying. the question is really bad. please delete it — Alex Gordon, Nov 11 '10 at 15:59
@slebetman: It's not true that all possible strings are valid regular expressions. The most simple example of a string that isn't would be '('; the opening bracket has a special meaning and must be closed. — Dirk Vollmar, Nov 11 '10 at 20:37

Brian McCutchon · Accepted Answer · 2014-09-27T01:51:53.883

(Assuming that you are looking for a JavaScript regexp literal, delimited by /.)

It would be simple enough to just look for everything in between /, but that might not always be a regexp. For example, such a search would return /2 + 3/ of the string var myNumber = 1/2 + 3/4. This means that you will have to know what occurs before the regular expression. The regexp should be preceded by something other than a variable or number. These are the cases that I can think of:

/regex/;
var myVar = /regex/;
myFunction(/regex/,/regex/);
return /regex/;
typeof /regex/;
case /regex/;
throw /regex/;
void /regex/;
"global" in /regex/;

In some languages you can use lookbehind, which might look like this (untested!):

(?=<^|\n|[^\s\w\/]|\breturn|\btypeof|\bcase|\bthrow|\bvoid|\bin)\s*\/(?:\\\/|[^\/\*\n])(?:\\\/|[^\/\n])*\/

However, JavaScript does not support that. I would recommend imitating lookbehind by putting the portion of the regexp designed to match the literal itself in a capturing group and accessing that. All cases of which I am aware can be matched by this regexp:

(?:^|\n|[^\s\w\/]|\breturn|\btypeof|\bcase|\bthrow|\bvoid|\bin)\s*(\/(?:\\\/|[^\/\*\n])(?:\\\/|[^\/\n])*\/)

NOTE: This regex sometimes results in false positives in comments.

If you want to also grab modifiers (e.g. /regex/gim), use

(?:^|\n|[^\s\w\/]|\breturn|\btypeof|\bcase|\bthrow|\bvoid|\bin)\s*(\/(?:\\\/|[^\/\*\n])(?:\\\/|[^\/\n])*\/\w*)

If there are any reserved words I am missing that may be followed by a regexp literal, simply add this to the end of the first group: |\bkeyword

All that remains then is to access the capturing group, using a code similar to the following:

var codeString = "function(){typeof /regex/;}";
var searchValue = /(?:^|\n|[^\s\w\/]|\breturn|\btypeof|\bcase|\bthrow)\s*(\/(?:\\\/|[^\/\*\n])(?:\\\/|[^\/\n])*\/)/g;
    // the global modifier is necessary!
var match = searchValue.exec(codeString); // "['typeof /regex/','/regex/']"
match = match[1]; // "/regex/"

UPDATE
I just fixed an error with the regexp concerning escaped slashes that would have caused it to get only /\/ of a regexp like /\/hello/

UPDATE 4/6
Added support for void and in. You can't blame me too much for not including this at first, as even Stack Overflow doesn't, if you look at the syntax coloring in the first code block.

score 3 · Answer 2 · edited May 23 '17 at 12:11

3

What do you mean by "regular expression"? aaaa is a valid regular expression. This is also a regular expression. If you mean a regular expression literal you might need something like this: /\/(?:[^\\\/]|\\.)*\// (adapted from here).

UPDATE

slebetman makes a good point; regular-expression literals don't need to start with /. In Perl or sed, they can start with whatever you want. Essentially, what you're trying to do is risky and probably won't work for all cases.

edited May 23 '17 at 12:11

Community

1
1

answered Nov 10 '10 at 23:40

Vivin Paliath

94,126
40
223
295

3

Regex literal depends on the programming language though. In tcl a regex literal is delimited by {}. In C it's "".And in Perl it can be delimited by anything you choose. – slebetman Nov 10 '10 at 23:51
@siebetman Good point. I thought about that but forgot to mention it. Will update. – Vivin Paliath Nov 10 '10 at 23:54

score 1 · Answer 3 · answered Nov 10 '10 at 23:40

Its not the best way to go about this.

You can attempt to do so with some degree of confidence (using EOL to break up into substrings and finding ones that look like regular expressions - perhaps delimited by quotation marks) however dont forget that a very long string CAN be a regex, so you will never have complete confidence using this approach.

score 1 · Answer 4 · answered Nov 11 '10 at 00:00

Yes, if you know whether (and how!) your regex is delimited. Say, for example, that your string is something like

aaaaa...aaa/b/aaaaa

where 'b' is the 'regular expression' delimited by the character / (this is a near-basic scenario); what you have to do is scan the string for the expected delimiter, extract whatever it's inbetween delimiters (paying attention to escape chars) and you should be set.

This, if your delimiter is a known character and if you are sure that it appears an even number of times or you want to discard the rest (for example, which set of delimiters are you considering in the following string: aaa/b/aaa/c/aaa/d)

If this is the case then you need to follow the same reasoning you'd do to find any substring in a given string. Once you've found the first regexp, keep parsing until you hit the end of the string or you find another regexp, and so on.

I suspect, however, that you are looking for a 'general rule' to find any string that, once parsed, would result in a valid regular expression (say we're talking about POSIX regexp-- try man re_format if you're under *BSD). If that is the case you could try every possible substring of every length of the given string and feed it to a regexp parser for syntax correctness. Still, you have proven nothing of the validity of the regexp, i.e. on what they actually match.

If that is what you're trying to do I strongly recommend finding another way or explaining better what you are trying to accomplish here.

do we ever use regex to find regex expressions?

4 Answers4

Linked