-2

Ive been trying to use preg_split but it's not working very well, this is how I'm using it:

$html_str = '<span class="spanClass" rel="rel span">Text Span</span>';

$arrTemp = preg_split('/\<span class=\"spanClass\" rel=\"(.+?)\"\>(.+?)\<\/span\>/', $html_str);

So I would get this 2 '(.+?)' variables into an array(span rel and Text Span).

I'm probably not thinking about it in the best possible way to solve my problem, but the fact is that my string will have more than one <span> mixed with trash html and I need to separate only the <span> content in an array. Any better ideas?

Gottlieb Notschnabel
  • 9,408
  • 18
  • 74
  • 116
Diogo Garcia
  • 536
  • 1
  • 8
  • 20
  • @Diogo: why not using all those wonderful features like DomDocument or simplexml, would be simpler no? – RageZ Oct 25 '11 at 12:01
  • 1
    @RageZ Simplexml would only work for valid XHTML. Only DOM can parse broken HTML. – Gordon Oct 25 '11 at 12:03
  • @Gordon: good remark but nothing that tidy wouldn't help ;-) – RageZ Oct 25 '11 at 12:06
  • @Phil the linked answer does not explain why you do not want to use Regex for parsing HTML at all. It's just a rant that got too many upvotes due to funny characters. Worst of all, it's wrong: PHP's Regex engine uses PCRE and that can parse HTML. I'm all for using a DOM parser, but in all fairness, whether to use Regex or a Parser depends on the complexity of the markup you want to parse. – Gordon Oct 25 '11 at 12:31
  • possible duplicate of [extract tag attribute from xml](http://stackoverflow.com/questions/7886267/extract-tag-attribute-content-from-xml) and [grabbing th href attribute of an a element](http://stackoverflow.com/questions/3820666/grabbing-the-href-attribute-of-an-a-element/3820783#3820783) – Gordon Oct 25 '11 at 12:35
  • @Gordon That answer was good enough to inspire an [entire article](http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html) from Jeff. Perhaps that is a better link. Still not sure why my comment was flagged – Phil Oct 25 '11 at 18:54

1 Answers1

0

First of all preg_split is the wrong function, you really meant preg_match from the syntax of regex that you seem to be using.

Correct use would be:

$html = '<span class="spanClass" rel="foo bar">Text Span</span>';
preg_match("/<span.*rel=[\"']([^\"']+)[\"'][^>]*>([^<]+)<\/span>/", $html, &$A);
print_r($A);

This outputs:

Array
(
    [0] => <span class="spanClass" rel="foo bar">Text Span</span>
    [1] => foo bar
    [2] => Text Span
)

So above uses preg_match; $A[0] contains the entire line $A[1] the rel= stuff and $A[2] the Text Span stuff.

Ahmed Masud
  • 21,655
  • 3
  • 33
  • 58