1

Okay, I am using (PHP) file_get_contents to read some websites, these sites have only one link for facebook... after I get the entire site I will like to find the complete Url for facebook

So in some part there will be:

<a href="http://facebook.com/username" >

I wanna get http://facebook.com/username, I mean from the first (") to the last ("). Username is variable... could be username.somethingelse and I could have some attributes before or after "href".

Just in case i am not being very clear:

<a href="http://facebook.com/username" >  //I want http://facebook.com/username
<a href="http://www.facebook.com/username" >  //I want http://www.facebook.com/username
<a class="value" href="http://facebook.com/username. some" attr="value" >  //I want http://facebook.com/username. some

or all example above, could be with singles quotes

<a href='http://facebook.com/username' > //I want http://facebook.com/username

Thanks to all

Richard Pérez
  • 1,467
  • 3
  • 15
  • 18

2 Answers2

3

Don't use regex on HTML. It's a shotgun that'll blow off your leg at some point. Use DOM instead:

$dom = new DOMDocument;
$dom->loadHTML(...);
$xp = new DOMXPath($dom);

$a_tags = $xp->query("//a");
foreach($a_tags as $a) {
   echo $a->getAttribute('href');
}
Marc B
  • 356,200
  • 43
  • 426
  • 500
1

I would suggest using DOMDocument for this very purpose rather than using regex. Here is a quick code sample for your case:

$dom = new DOMDocument();
$dom->loadHTML($content);

// To hold all your links...
$links = array();

$hrefTags = $dom->getElementsByTagName("a");
    foreach ($hrefTags as $hrefTag)
       $links[] = $hrefTag->getAttribute("href");

print_r($links); // dump all links
anubhava
  • 761,203
  • 64
  • 569
  • 643