0

I am pretty new to Regex and what I want to do is the following.

Say I have a keyword array:

// the kwd is changed dynamically
var kwd = ["key1", "key2", "key3", "key4"];

I need to build a Regex to test if a string contains any of the keywords in that array. I wonder how can I generate that Regex expression dynamically?

Probably a function:

function RegexBuilder(kwd){
    // I know I can use brutal force to search one by one, 
    // but I just need to know how to generate that regex?
    return regex_expression;
}
demongolem
  • 9,474
  • 36
  • 90
  • 105
Kuan
  • 11,149
  • 23
  • 93
  • 201
  • What language is this, JavaScript? Please tag your question accordingly – Bergi Dec 09 '15 at 18:36
  • If this is javascript, you should just use a `for` loop and iterate over every keyword and check `if(someString.contains(keywords[i]))`. If you *want* a regex, use the or operator `|` and concatenate every string in the `kwd[]` with a `|` in between them, and you have your regex expression. (e.g. `keyword1|cool|otherword` would be the resulting regex from `kwd[] = {"keyword1", "cool", "otherword"}`. – Maximilian Gerhardt Dec 09 '15 at 18:36
  • 1
    Have a look at the [`RegExp` constructor](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp) – Bergi Dec 09 '15 at 18:37
  • @MaximilianGerhardt Thanks, I am just not sure how to turn a concatnated string into Regex expression – Kuan Dec 09 '15 at 18:42
  • regex is the wrong tool for the job. Just use a loop and `indexOf`. – zzzzBov Dec 09 '15 at 18:43
  • @Kuan: An [edit] would've been enough :-) – Bergi Dec 09 '15 at 18:43
  • @Bergi Sorry, I did not quite catch the point, could you show an example in building a regex? Let us say the keywords now is ["Hello there", "Hola mucho", "(Bonjour / Mecier)"] – Kuan Dec 09 '15 at 18:54

2 Answers2

1

I assume there shouldn't be any special chars inside key which will make this a more complex solution.

function RegexBuilder(kwd){
var regex_expression = '(' + kwd.join('|').replace(/([^\w\d\s|])/gm,'\\$1') + ')';
return regex_expression;
}

You can use it this way

var regex = new RegExp(RegexBuilder(kwd));
if(mytext.match(regex)) // do something
  • thanks, but how can I use it as Regex object? Also, there are "/" and space in my keywords – Kuan Dec 09 '15 at 18:49
  • See my edited answer. Space and / should work this way. Those are not special chars in regex. – Tᴀʀᴇǫ Mᴀʜᴍᴏᴏᴅ Dec 09 '15 at 18:55
  • You need to escape the `/` *and* `\ ` characters before they enter the regex expression then. Space shoud be no problem. Just do something like `kwd.replace("\\", "\\\\").replace("/", "\\/").join('|')`. If you want a regex object, pass it into the constructor of a `new RegExp(regex_expression)`. The brackets are bad in the regex too, because they open a new capture group. So in fact, you should escape every special character which is used in regex. Or, just use `indexOf()` and don't use any regexes at all. – Maximilian Gerhardt Dec 09 '15 at 18:55
  • @MaximilianGerhardt Thanks, do I need to replace "(" and ")" with "\(" and ")" – Kuan Dec 09 '15 at 19:00
  • 1
    @MaximilianGerhardt: there is zero reason to escape `/` characters. – Bergi Dec 09 '15 at 19:00
  • Is that so? https://i.imgur.com/NEh1mQK.png (@Bergi). I'm always using regexr.com, but maybe this is also true for javascript regexes. – Maximilian Gerhardt Dec 09 '15 at 19:01
  • 1
    @Kuan: Just use an [established escaping function](http://stackoverflow.com/q/3561493/1048572) instead of rolling your own – Bergi Dec 09 '15 at 19:01
  • @MaximilianGerhardt: That's a regex literal, where it needs to be escaped as it is the literal delimiter. We were talking about regular expression strings that go into the `RegExp` constructor. – Bergi Dec 09 '15 at 19:02
  • @Bergi Thanks, could you give a little explain which characters in total does this line replace(one by one is appreciated)? s.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'); Are there 17 characters totally? – Kuan Dec 09 '15 at 19:04
  • @Kuan: exactly those that are matched by that regex. Basically every [special character](http://www.regular-expressions.info/characters.html). – Bergi Dec 09 '15 at 19:07
  • @Bergi sorry I copied that from the post you refer to, but I do not quite understand what that line means? I wonder why some of those characters does not need escape, I thought each character should be with a "\"? Basically like s.replace(/[\-\/\\\^\$\\*\+\?\.\\(\\)\|\\[\\]\{\}]/g, '\\$&'); – Kuan Dec 09 '15 at 19:14
  • 1
    @Kuan: Because they're in a [character class](http://www.regular-expressions.info/charclass.html) where different characters are considered special (escaping them would do no harm but be unnecessarily verbose). – Bergi Dec 09 '15 at 19:17
  • @Bergi Thanks, this may take a little time for me to fully understand, but one simple example: if I use [A-Za-z] , it will think I want to match from a to z OR A to Z, but what if I want to match are "A", "Z", "-", "a", "z"? – Kuan Dec 09 '15 at 19:21
  • @Kuan: `RegExp.escape("[A-Za-z]") === "\\[A\\-Za\\-z\\]"` – Bergi Dec 09 '15 at 19:23
  • @Kuan I have updated the answer to escape any character other than letter, numbers and spaces). And no need to worry about unnecessary escape, it never harms. – Tᴀʀᴇǫ Mᴀʜᴍᴏᴏᴅ Dec 09 '15 at 19:27
  • @Bergi Thanks, like you said "Because they're in a character class where different characters are considered special", but why "/" and "\" still need to escape? – Kuan Dec 09 '15 at 19:34
  • @Kuan: I said *different*, not *none*. `-`, `]` and `\` still need to be escaped in character classes, `/` still needs to be escaped in regex literals. – Bergi Dec 09 '15 at 20:45
1

This should do it. The constructor for the RegExp object can be used to build a RegExp from a string instead of a literal.

function RegexBuilder(kwd){
    kwd.forEach(function(e,i,a){
        a[i] = e.replace(/[()^|$[\]*.\\?{}]/g, "\\$&");
    });
    regex_expression = new RegExp(kwd.join('|'));
    return regex_expression;
}

https://regex101.com/r/vE0cI0/1

miken32
  • 42,008
  • 16
  • 111
  • 154
  • Thanks, so no need to aware of special character set? Just combine will be fine? – Kuan Dec 09 '15 at 18:55
  • You would need to do some filtering for special regex characters, and there are a lot of them. I've edited my answer to include a very quick and dirty (and incomplete) fix but this is a bad idea with anything other than `[a-z0-9]`. – miken32 Dec 09 '15 at 19:06