1

I'm trying to learn how to get data out of a page with php, I can see how to get everything between tags, but is there a way to get the content of tags within tags?

In the html below,how would I get access to the content of one of the bold spans, the second one for example?

<html>
<div class="padding10">
<span class="bold"></span>
<span class="bold"></span>
<span class="bold"></span>
<span class="bold"></span>
</div>
</html>

I tried the following, which allows me to get the content of the padding10 div but I don't know how to go any further to get the bold spans. Everything I've tried doesn't work.

//gets all
$file_string = file_get_contents('http://www.test.com/index.html');

//gets all in padding10 div
preg_match('/<div class="padding10">(.*)<\/div>/si', $file_string, $padding_10);

//gets all bold spans on padding10 div??
preg_match_all('/<span class="bold">(.*)<\/span>/i', $padding_10[1], $spans_10);

I'm starting to realise from what I'm reading that this is probably a wrong or inefficient way to be going about this but any help would be great. Thanks.

Problematic
  • 17,567
  • 10
  • 73
  • 85
mao
  • 1,059
  • 2
  • 23
  • 43
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Ignacio Vazquez-Abrams Feb 23 '12 at 03:19
  • This should get you started : http://stackoverflow.com/questions/1898905/recursive-regular-expression-to-process-nested-strings-enclosed-by-and – yoda Feb 23 '12 at 03:19
  • [Have you tried an HTML parser?](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – deceze Feb 23 '12 at 03:19

2 Answers2

4

have you tried this?

dee
  • 194
  • 7
  • This is a lot simpler. It's working for me, code in case anyone finds it useful: foreach($html->find('div[class=padding10]') as $element); foreach($element->find('span[class=bold]') as $e) echo $e->innertext . '
    ';
    – mao Feb 23 '12 at 03:57
2

Maybe phpQuery could help?

"a server-side, chainable, CSS3 selector driven Document Object Model (DOM) API based on jQuery JavaScript Library." This will allow you to select stuff from a parsed HTML document. This may be better-suited to HTML parsing/traversing than doing regexes "by hand".

http://code.google.com/p/phpquery/

Roadmaster
  • 5,297
  • 1
  • 23
  • 21
  • sorry never really used php before, is it as simple as downloading and adding : require_once('phpQuery-onefile.php'); – mao Feb 23 '12 at 03:34