-2

Possible Duplicate:
Grabbing the href attribute of an A element

I'm trying to extract some folders names from HTML page, The source code of the HTML looks like this ..

<li><a href="/"> Parent Directory</a></li>
<li><a href=".ftpquota"> .ftpquota</a></li>
<li><a href="Folder%201/"> Folder 1/</a></li>
<li><a href="Floder%202/"> Folder 2/</a></li>
<li><a href="Folder%20N/"> Folder N/</a></li>

What I created so far, I can extract some folders, but not extracted correctly.

Here is what I've done ..

<?php

    $url = "URL";
    $page_data = file_get_contents($url);
    $search_pattern = "<li><a href=";
    $position = 0;

    while($position = strpos($page_data,$search_pattern, $position+strlen($search_pattern)))
    {
        //$pos2 = strpos($page_data, "\"> ", $position);
        //echo $position . " - " . $pos2 . " = " . ($pos2-$position) . "<br />";
        $str = substr($page_data,$position+strlen($search_pattern)+1, $pos2-$position);
        echo "<pre>" . $position . " || " . $str . "\n</pre>";
    }

?>

Each folder contains some files that I will copy using copy() since I'm using Windows, so I don't have wget.

What I'm doing wrong here??

This is my output:

156 || /"> Parent Directory
.ftpquota

Folder 1/

Folder 2/

Folder N/

But what I really need is:

Folder 1
Folder 2
Folder N

Cause later on, I'll loop through the folders and copy the files.

Community
  • 1
  • 1
sikas
  • 5,435
  • 28
  • 75
  • 120
  • [simplehtmldom.sourceforge.net](http://simplehtmldom.sourceforge.net/) – Vinay Sep 06 '12 at 08:57
  • `print_r (explode(' ', strip_tags($string)))` will give you the array of all data between tags... where `$string` equals the input html. – Vishal Sep 06 '12 at 09:01

1 Answers1

5

Use DOMDocument and DOMXPath for HTML

$string = '<li><a href="/"> Parent Directory</a></li>
<li><a href=".ftpquota"> .ftpquota</a></li>
<li><a href="Folder%201/"> Folder 1/</a></li>
<li><a href="Floder%202/"> Folder 2/</a></li>
<li><a href="Folder%20N/"> Folder N/</a></li>
<li><a href="file.bin"> file.bin</a></li>';

$html = new DOMDocument(); 
$html->loadHTML($string);
$xpath = new DOMXPath($html);
$filtered = $xpath->query("//a/@href");

foreach($filtered as $one){
    if(strlen($one->nodeValue) > 1) {
        echo urldecode($one->nodeValue)."\n";
    }
}

Codepad Example

Mihai Iorga
  • 39,330
  • 16
  • 106
  • 107