3

This is the sort of HTML string I will be performing matches on:

<span class="q1">+12 Spell Power and +10 Hit Rating</span>

I want to get +12 Spell Power and +10 Hit Rating out of the above HTML. This is the code I wrote:

preg_match('/<span class="q1">(.*)<\/span>/', $gem, $match);

But due to <\/span> it's escaping the / in </span> so it doesn't stop the match, so I get a lot more data than what I want.

How can I escape the / in </span> while still having it part of the pattern?

Thanks.

VIVA LA NWO
  • 3,852
  • 6
  • 24
  • 21

3 Answers3

3

I think the reason that your regex is getting more than you want is because * is greedy, matching as much as possible. Instead, use *?, which will match as little as possible:

preg_match('/<span class="q1">(.*?)<\/span>/', $gem, $match);
davidscolgan
  • 7,508
  • 9
  • 59
  • 78
  • That works thanks. Reason I don't want to use the DOMDocument class is that it's a very small piece of HTML and this code will only be run once, I'm collecting data to be put into a database. No need to complicate things. :) – VIVA LA NWO Jun 20 '10 at 00:52
2
  1. Don't use regex to parse HTML
  2. use DOM, particularly the loadHTML method and getElementsByTagName('span')

-

    $doc = new DOMDocument();
    $doc->loadHTML($htmlString);
    $spans = $doc->getElementsByTagName('span');
    if ( $spans->length > 0 ) {
     // loop on $spans
    }
meder omuraliev
  • 183,342
  • 71
  • 393
  • 434
2

Don't use regex to parse HTML. Use an HTML parser. See Robust, Mature HTML Parser for PHP.

Community
  • 1
  • 1
Jason
  • 86,222
  • 15
  • 131
  • 146