-1

Possible Duplicate:
Grabbing the href attribute of an A element

I would like to make a php script that extracts all the href links from a webpage (mine) but only links with "/view/" in their string.

http://www.example.com/roger/that => not extracted

http://www.example.com/roger/view/that => extracted

And if possible all the links would be set in an array

So basically the script would be in my admin section and I would run it to get all the links containing the specific string '/view/' in an array to use later in another script.

I've done my research and found this script but can't modify it to only include the specific links (with "/view/")

I know you guys are not my slaves so even if you have any tips for modifying the existing script I would be happy !

My script http://pastebin.com/gYf9DZ8i

Thanks !

Community
  • 1
  • 1
francoboy7
  • 303
  • 3
  • 13
  • If you already managed to extract a list of *all* links, then just filter those. `$view_links = preg_grep('#/view/#', $matches[1]);` – mario Oct 27 '12 at 16:24

3 Answers3

1

Fetch the page contents using file_get_contents.

$input = file_get_contents("http://www.yourpage.php");

Then do a preg_match to extract the set of links you want.

Regex: /\<a href(.*?\/view\/.*?)<\/a>/

$pattern = '/\<a href(.*?\/view\/.*?)<\/a>/';
preg_match_all($pattern, $input, $matches);
print_r($matches);

(Example)

Anirudh Ramanathan
  • 46,179
  • 22
  • 132
  • 191
0

You just need to change this:

preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+".
                "(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/",
                $var, &$matches);

into this

preg_match_all ("/<a.*href=\"([^\"]*\/view\/[^"]*)\"/", $var, &$matches);
Almir Sarajčić
  • 1,520
  • 3
  • 16
  • 19
0
$var = file_get_contents("http://www.entendu.info");

preg_match_all ("/<a\s+[^>]*?\bhref\s*=\s*([\'\"])(?=[^\'\"]*\/view\/)(.*?)[\'\"]/", 
  $var, &$matches);    

$matches = $matches[2];

foreach($matches as $var)
{    
  print($var . "<br>\n");
}
Ωmega
  • 42,614
  • 34
  • 134
  • 203