PHP script to extract specific href links

Question

Possible Duplicate:
Grabbing the href attribute of an A element

I would like to make a php script that extracts all the href links from a webpage (mine) but only links with "/view/" in their string.

http://www.example.com/roger/that => not extracted

http://www.example.com/roger/view/that => extracted

And if possible all the links would be set in an array

So basically the script would be in my admin section and I would run it to get all the links containing the specific string '/view/' in an array to use later in another script.

I've done my research and found this script but can't modify it to only include the specific links (with "/view/")

I know you guys are not my slaves so even if you have any tips for modifying the existing script I would be happy !

My script http://pastebin.com/gYf9DZ8i

Thanks !

If you already managed to extract a list of *all* links, then just filter those. `$view_links = preg_grep('#/view/#', $matches[1]);` — mario, Oct 27 '12 at 16:24

Anirudh Ramanathan · Answer 1 · 2012-10-27T16:33:37.383

1

Fetch the page contents using file_get_contents.

$input = file_get_contents("http://www.yourpage.php");

Then do a preg_match to extract the set of links you want.

Regex: /\<a href(.*?\/view\/.*?)<\/a>/

$pattern = '/\<a href(.*?\/view\/.*?)<\/a>/';
preg_match_all($pattern, $input, $matches);
print_r($matches);

(Example)

edited Oct 27 '12 at 16:33

answered Oct 27 '12 at 16:20

Anirudh Ramanathan

46,179
22
132
191

Almir Sarajčić · Answer 2 · 2012-10-27T16:39:33.937

0

You just need to change this:

preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+".
                "(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/",
                $var, &$matches);

into this

preg_match_all ("/<a.*href=\"([^\"]*\/view\/[^"]*)\"/", $var, &$matches);

edited Oct 27 '12 at 16:39

answered Oct 27 '12 at 16:26

Almir Sarajčić

1,520
3
16
19

1

The quotes inside the second regex needs to be escaped. This one will fire syntax error. – janenz00 Oct 27 '12 at 16:37

score 0 · Answer 3 · answered Oct 27 '12 at 16:38

0

$var = file_get_contents("http://www.entendu.info");

preg_match_all ("/<a\s+[^>]*?\bhref\s*=\s*([\'\"])(?=[^\'\"]*\/view\/)(.*?)[\'\"]/", 
  $var, &$matches);    

$matches = $matches[2];

foreach($matches as $var)
{    
  print($var . "<br>\n");
}

answered Oct 27 '12 at 16:38

Ωmega

42,614
34
134
203

PHP script to extract specific href links

3 Answers3