0

I'm using PHPSimpleHTMLDOM Parser and I'd like to somehow implement a foreach loop with two conditions. I'm getting the headlines that I want, but I also want to get the href which applies to that particular headline. If I write a nested foreach loop for the href alone, it loops way too many times and outputs many duplicates. Here's my code:

include_once ('simple_html_dom.php');
$html = file_get_html('somehtml.com');

  foreach ($html->find('ul[class=headlines] li') as $return){
    //if I put another foreach here, too many duplicates
    echo $return;
  }

The other foreach loop looks like this:

foreach ($html->find('ul[class=headlines] li a') as $href){
  $link = $href->href;
  echo $link;
}

How can I put these two conditions into one foreach loop so the link corresponds to the correct article and I can pass it along to another php file to do something with it? Thanks in advance

user2025469
  • 1,531
  • 2
  • 14
  • 26
  • Just a suggestion. Use [cURL](http://php.net/manual/en/book.curl.php) – SilentAssassin Feb 28 '13 at 12:40
  • Can you be more specific? How would I do that? – user2025469 Feb 28 '13 at 12:43
  • Search on google and here. There are lot of examples. I used it to extract anchor links from a page. You can check [this](http://stackoverflow.com/questions/3062324/what-is-curl-in-php) for more info on cURL. I am not giving a solution, its just a suggestion as I said earlier. – SilentAssassin Feb 28 '13 at 12:50

1 Answers1

0

Suppose you have the following HTML structure:

<ul class="headlines">
    <li><a href="http://google.com">Google</a></li> 
    <li><a href="http://yahoo.com">Yahoo</a></li>   
    <li><a href="http://bing.com">Bing</a></li>
</ul>

Then you have to traverse all of the li items and fetch their nth child which corresponds to the a tag (in this case it is the first one) like that:

foreach ($html->find('ul[class=headlines] li') as $return){
    $a = $return->children(0);
    echo 'Link: ' . $a->href . '<br />';
    echo 'Headline: ' . $a->plaintext . '<br />';
}

Note that you could simply print out $a here, without fetching the link and the headline separately.

I would suggest you to use some native extension based on libxml for better performance, such as DOM. You also can combine it with XPath to make things simpler.

cth
  • 188
  • 7