2

Using php I would like to detect something in a string depending on whether the part immediately after the matched part does not contain certain characters. I think I may need to use a lookahead negative assertion but it is not working as I would expect so not sure. So for example:

$string = 'test somerandomURL><b>apple</b> peach orange';

I want an expression that will detect everything up to the first > so long as <b>apple</b> does not immediately follow the first >

I tried

if(preg_match("/test(.*)>(?!<b>apple<\/b>)(.*)/i",$string)){
echo 'true since <b>apple</b> does not follow > in string';
}else{
echo 'false since <b>apple</b> follows > in string';
}

When the string contains <b>apple</b> after the > it returns false as I expect and need but when I change the string to have <b>peach</b> after the > instead of <b>apple</b> it still returns false and I need it to return true. Without the bold tags it seems to work as I would expect. Any ideas what I am doing wrong?

MorningSleeper
  • 415
  • 1
  • 5
  • 13
  • The first `.*` already matches most of the input string (until the closing ``s `>` probably), making your assertion somewhat superfluous. Try `[^>]*` instead. --(And before someone posts the redundant-and-only-kept-for-historical-reasons joke link, yes, there are [simpler APIs](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) for processing text with interspersed html.) – mario Sep 15 '13 at 01:52
  • Try this: `(?=test.*?>(?!apple)[^<]+)[^>]+` – Enissay Sep 15 '13 at 01:58
  • what wouldn't the first `.*` stop at the first `> and how do I make it do so? – MorningSleeper Sep 15 '13 at 02:00
  • @Enissay, if you think you have a solution, post it as an answer, not as a comment. But don't post that one; `(?=test.*?>` is never going to succeed because requires the next four characters to be both `test` and ``. – Alan Moore Sep 15 '13 at 05:55

1 Answers1

1

You're doing a few things wrong, including not turning on error reporting because you have mismatched ()'s and an / that needs escaping.

The first problem is that .* is greedy. So in your string it will match: test somerandomURL><b>apple</b>

Usually the solution to that is to make it lazy with .*? but because of the negative lookahead, laziness will still make it match up to a further > when apple is found.

The solution is a negated character class. [^>]* will match any char except >

/test([^>]*)>(?!<b>apple<\/b>)(.*)/i
pguardiario
  • 53,827
  • 19
  • 119
  • 159