0

I have the following string and need to extract the text inside the div's (EDITOR'S PREFACE, MORE CONTENT, etc) and put them into an array with php. How could I do this?

Thanks in advance.

<div class='classit'><a href='site.php?site=1&filename=aname4'>EDITOR'S PREFACE</a></div> 
<div class='classit'><a href='site.php?site=4&filename=aname3'>MORE CONTENT</a></div> 
<div class='classit'><a href='site.php?site=3&filename=aname4'>LAST LINE</a></div> 
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
usertest
  • 27,132
  • 30
  • 72
  • 94
  • Not with regex - http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Pete Jun 23 '10 at 18:44

4 Answers4

3

Use Simple HTML DOM

$html = <<<HTML
<div class='classit'><a href='site.php?site=1&filename=aname4'>EDITOR'S PREFACE</a></div> 
<div class='classit'><a href='site.php?site=4&filename=aname3'>MORE CONTENT</a></div> 
<div class='classit'><a href='site.php?site=3&filename=aname4'>LAST LINE</a></div> 
HTML;

$src = str_get_html($html); 
$elem = $src->find("div.classit a");

foreach ($elem as $link) {
    $links[] = $link->plaintext;
}

print_r($links);
racerror
  • 1,599
  • 8
  • 9
1

You could use PHP's own DOM extension

$string = '<div><a>Elem 1</a></div><div><a>Elem 2</a></div>...etc';

$dom = new DOMDocument();
$dom->loadHTML($string);

$elements = $dom->getElementsByTagName('a');

$textElements = array();
foreach($elements as $node) {
    textElements[] = $node->nodeValue;
}

If you want to load a larger HTML extract, you could use DOMXPath to query the DOMDocument in order to just get the elements you want.

$xPathObj = new DOMXPath($dom);
$elements = $xPathObj->query('//div[@class='classit']/a');

Edit

DOMNodeList supports foreach, so I've changed for($i = 0; $i < $elements->length; $i++) {$elements->item($i)->nodeValue;} to foreach($elements as $node) {$node->nodeValue}

Ivar Bonsaksen
  • 4,747
  • 2
  • 32
  • 34
0

you could do using strip_tags:

$s = "<div class='classit'><a href='site.php?site=1&fn=aname4'>EDITOR'S PREFACE</a></div> 
<div class='classit'><a href='site.php?site=4&filename=aname3'>MORE CONTENT</a></div> 
<div class='classit'><a href='site.php?site=3&filename=aname4'>LAST LINE</a></div> ";

foreach (explode("\n", $s) as $val){
    $new[] = strip_tags($val);
}
var_dump($new);
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
0

You could use preg_match_all:

<?php
$html = <<<HTML
<div class='classit'><a href='site.php?site=1&filename=aname4'>EDITOR'S PREFACE</a></div>
<div class='classit'><a href='site.php?site=4&filename=aname3'>MORE CONTENT</a></div>
<div class='classit'><a href='site.php?site=3&filename=aname4'>LAST LINE</a></div>
HTML;

$result = array();

if (preg_match_all('/>([^><]+)(?=<\/a>)/', $html, $matches))
{
    $result = $matches[1];
}

print_r($result);