0

can this be done using only one regular expression?

Edit: Please, don't complain about me parsing HTML :) The same situation can be reproduced with plain text :

Supposed source string:

Lorem 1 ipsum. Lorem 2 ipsum TOKEN 
foo. Lorem 3 ipsum

Supposed source string HTML version:

<div id="entry-1">Lorem ipsum</div>
<div id="entry-2">Lorem ipsum TOKEN</div>
<div id="entry-3">Lorem ipsum</div>

What I want to get:

2, because that "Lorem ipsum" contains the token.

I'm trying it using: /([0-9]+).*TOKEN/sm, but I get 1, because it starts looking for TOKEN after finding the first "digit", that is 1.

Using two separated regex/preg_match it's easy, but I wonder if this approach could be improved.

Thanks in advance for your help :)

3 Answers3

2

Try the non-greedy *

/entry-([0-9]+).*?TOKEN/sm

Doesn't work on all platforms, but it might work in that (is that javascript?)

Patrick
  • 1,766
  • 1
  • 15
  • 27
  • As far as I could understood and could test the non-greedy behaviour would avoid finding a longer string containing 2 tokens, it will stop after finding the first one. It's PHP btw. Thanks – John Smith Dec 13 '11 at 16:06
0

I'd use a positive lookbehind to make sure that you match TOKEN, like so:

<div id="entry-([0-9]+)">.*(?<=TOKEN)</div>

You can use it like this:

$result = preg_match('%<div id="entry-([0-9]+)">.*(?<=TOKEN)</div>%i', $subject, $matches);

This will match the second example, but not the first or third.

nickb
  • 59,313
  • 13
  • 108
  • 143
  • It only works if the text containing the token is single line, even if I add the ms modifiers :( See it here: http://www.ideone.com/VyO6n – John Smith Dec 13 '11 at 16:35
0

Your regex is correct, but the problem is with the s modifier which causes . to match newline too and this makes your regex match the 1. Drop the s.

Also you don't need the m modifier as you are not using anchors in your regex.

See it

This answer assumes that the entry-[0-9] and the TOKEN are on the same line in the input.

codaddict
  • 445,704
  • 82
  • 492
  • 529
  • I need the s and m modifiers because the text containing the TOKEN could have several lines :( Like here: http://www.ideone.com/KryNE Thanks for that link, very useful. – John Smith Dec 13 '11 at 16:13