1

I'd like my Regex to match "foo" and "bar", but not if "foo" or "bar" start with "a ", "an ", or "the ".

foo and bar are not guaranteed to be at the start or end of a string.

Example matches:

"end of Foo." [1 Match: Foo]
"end of bar." [1 Match: bar]
"The foo and bar" [1 Match: bar]
"foo bar" [2 Matches: foo, bar]

Example no matches:

"foobar"
"foofoo"
"the foo"
"a bar"
"andbar"
"the foo goes to a bar."

I guess I may have to do a negative lookbehind? If so, could this be converted into a negative lookahead for portability with JS?

I've tried

/(\bfoo\b|\bbar\b)(?!the|a(n)?)/igm

but this doesn't work.

Many thanks.

AeroX
  • 3,387
  • 2
  • 25
  • 39
Slate
  • 3,189
  • 1
  • 31
  • 32

2 Answers2

2

You may find it easier to write the Regex to match the words spelt backwards and reverse the character order of the strings before matching. This then lets you simulate a Negative Look Behind using a Negative Look Ahead.

So the regex with reversed words would be:

/\b(?:oof|rab)\b(?!eht|n?a)/igm

Visualisation:
Regular expression visualization

The JavaScript is then:

function ReverseString(str) {
    return str.split("").reverse().join("");
}

var myRegex = /\b(?:oof|rab)\b(?!eht|n?a)/igm;
var myString = "This is foo possibly";

alert( ReverseString(myString).match(myRegex) );
AeroX
  • 3,387
  • 2
  • 25
  • 39
  • Good idea, don't forget there is a space between `the` and `foo`. – Casimir et Hippolyte Apr 04 '14 at 15:30
  • Haha, good ol JS and nice idea. I'm actually writing for C# at the moment - JS is for future portability's sake where I may have to translate it, hence why I left the language out of the tags. I'm sure I can find a reverse function with Unicode support for C# though [ http://stackoverflow.com/questions/228038/best-way-to-reverse-a-string ] ! – Slate Apr 04 '14 at 15:45
  • 1
    You should have a more complete Regex implementation available to you in C# which supports `Negative Look Behinds` so I'd just write your regexes normally in C# and then look into using the reverse method when it comes to porting it to JavaScript. – AeroX Apr 04 '14 at 16:08
1

A way consists to test the existence of a capturing group for each string you test. The pattern must contain the cases you want to avoid too (to skip this part of the string). The capturing group contains allowed cases. Example:

/\b(?:(?:the |an? )(?:foo|bar)|(foo|bar))\b/gi

Example:

var re = /\b(?:(?:the |an? )(?:foo|bar)|(foo|bar))\b/gi;
var str = "the foo a bar FoO BAr";
var myArray;
var result = Array();
while ((myArray = re.exec(str)) !== null) {
    if (myArray[1]) { // <- test if the capture group is defined
        result.push(myArray[1]);
    }
}

console.log(result);
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • So does this match with articles, but has an empty capture? I'm a little confused by how this is matching in an online parser. http://regexr.com/38lkg :) – Slate Apr 04 '14 at 15:40
  • @kjhf: An online regex tester is not the good tool to test that, since the pattern will match every string that contains foo or bar. It is better to test the code example. – Casimir et Hippolyte Apr 04 '14 at 15:44
  • 1
    @kjhf: note that you can do the job with a single regex, but you need a more advanced regex engine like pcre or perl. – Casimir et Hippolyte Apr 04 '14 at 16:07
  • I marked this as best answer for readability, though the reverse Regex is a decent solution. – Slate Apr 07 '14 at 08:36