So basically, my input string is some kind of text containing keywords that I want to match, provided that:
- each keyword may have whitespace/non-word chars pre/appended, or none
(|\s\W)
- there must be exactly one non-word/whtiespace char seperating multiple keywords, or keyword is at begining/end of line
- Keyword simply ocurring as a substring does not count, e.g.
bar
does not matchfoobarbaz
E.g.:
input: "#foo barbazboo tree car"
keywords: {"foo", "bar", "baz", "boo", "tree", "car"}
I am dynamically generating a Regex in C# using a enumerable of keywords and a string-builder
StringBuilder sb = new();
foreach (var kwd in keywords)
{
sb.Append($"((|[\\s\\W]){kwd}([\\s\\W]|))|");
}
sb.Remove(sb.Length - 1, 1); // last '|'
_regex = new Regex(sb.ToString(), RegexOptions.Compiled | RegexOptions.IgnoreCase);
Testing this pattern on regexr.com, given input matches all keywords. However, I do not want {bar, baz, boo}
included, since there is no whitespace between each keyword.
Ideally, I'd want my regex to only match {foo, tree, car}
.
Modifying my pattern like (( |[\s\W])kwd([\s\W]| ))
causes {bar, baz, boo}
not to be included, but produces bogus on {tree, car}
, since for that case there must be at least two spaces between keywords.
How do I specify "there may be only one whitespace seperating two keywords", or, to put it differently, "half a whitespace is ok", preserving the ability to create the regex dynamically?