2

I am learning to grabbing data with curl. This my code.

function readHTML($url){
 $data = curl_init();
 curl_setopt($data, CURLOPT_RETURNTRANSFER, 1);
 curl_setopt($data, CURLOPT_URL, $url);
 $result = curl_exec($data);
 curl_close($data);
 return $result;}

    $codeHTML =  readHTML('http://website.com/');$ex1 = explode('ol class=tabcont>', $codeHTML); $ex2 = explode('/ol>', $ex1[1]);echo $ex2[0];

I Have a problem with this output html code.

<ul>
<li>content</li>
<li>content</li>
<li>content</li>
<li>content</li>
<li>content</li>
<li>content</li>
<li>content</li>
<li>content</li>
</ul>

I want to cut the code <li></li> with PHP so the code like it

<ul>
<li>content</li>
<li>content</li>
<li>content</li>
<li>content</li>
<li>content</li>
</ul>

How can i do it. sorry my english is bad. :) Thanks.

  • 1
    Are you looking for pagination? Or just minimize the number of rows displayed on a single page – Daryl Gill Apr 09 '14 at 00:18
  • if you want to parse *existing* html, there's some options too: http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php – Jorg Apr 09 '14 at 00:19
  • no, the html is output from grabbing and shows like that and i want to minimize to five li tag. i want to use function explode but the code is same. thanks for editing :) – user3513136 Apr 09 '14 at 00:27
  • If you're grabbing html, please consider parsing it with the [DomDocument](http://www.php.net/manual/en/class.domdocument.php) class. This will make your life easier, as you can easily delete nodes and just have 5 (or however many) `li` tags within `ul`. – Dave Chen Apr 09 '14 at 00:35
  • @DaveChen `DomDocument` doesnt work with HTML. –  Apr 09 '14 at 00:41
  • @user3513136 I wouldn't recommend using explode for scraping HTML. There are great extensions made just for this type work. – Dave Chen Apr 09 '14 at 01:12
  • hhhe,, i am just learning about PHP. i must study hard again :) – user3513136 Apr 09 '14 at 01:22

1 Answers1

2

Since you are grabbing this HTML, instead of being hard-coded. I feel using DomDocument is appropriate.

<?php

$html = '<ul>
<li>content</li>
<li>content</li>
<li>content</li>
<li>content</li>
<li>content</li>
<li>content</li>
<li>content</li>
<li>content</li>
</ul>';

$dom = new DOMDocument();
$dom->loadHTML($html);

$ul    = $dom->getElementsByTagName('ul')->item(0);
$count = 0;

$toRemove = array();

foreach ($ul->childNodes as $node)
    if ($node->tagName === 'li')
        if ($count++ >= 5)
            $toRemove[] = $node;
foreach ($toRemove as $node)
    $ul->removeChild($node);

$dom->removeChild($dom->firstChild);
$dom->replaceChild($dom->firstChild->firstChild->firstChild, $dom->firstChild);
echo $dom->saveHTML();

Output:

<ul><li>content</li>
<li>content</li>
<li>content</li>
<li>content</li>
<li>content</li>



</ul>

The empty lines are due to the new lines around the <li> tags. You can remove them too by checking for the #text as well.

Dave Chen
  • 10,887
  • 8
  • 39
  • 67
  • 1
    @user3513136 If you are satisfied with this answer please accept it with a checkmark. I would ask again that you reconsider using explode to scrape html data. If anything changes, nothing will work. – Dave Chen Apr 09 '14 at 01:28