-1

I'm trying to create a social bookmarking site using php and mysql.

When I save a website's URL, I want to be able to save the site's title, favicon and description in a table in my database, then print them on my page using ajax.

How can I extract those elements from a website?

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>

<body>
<?php
$myServer = "localhost";
$myUser = "root";
$myPass = "'100pushups'";
$myDB = "social_bookmarking";

//connection to the database
$connect = mysqli_connect($myServer,$myUser, $myPass)
or die("Couldn't connect to SQLServer on $myServer");

//select a database to work with
$selected = mysqli_select_db($connect, $myDB)
or die("Couldn't open database $myDB");

var_dump($_POST);
//declare the SQL statement that will query the database
$url = "INSERT INTO url (url ) VALUES ('$_POST[url]')";
if (isset($_POST['value']))    
    {    
         // Instructions if $_POST['value'] exist
         echo 'Your url is ' .$url; 
            }
$data = get_meta_tags($url);
print_r($data);
if (!mysqli_query($connect, $url)) {
    die('Error: ' . mysql_error());
}
else
{
    echo "Your information was added to the database";  
}

mysqli_close($connect);
?>
</body>
</html>

I know I'm doing something wrong with my url there, but I don't know how to use a variable as an argument in get_meta_tags, since the function only accepts filenames or strings.

Matt
  • 17
  • 6

3 Answers3

1

You can get the title by using: (courtesy of https://stackoverflow.com/users/54680/jonathan-sampson)

<?php
    if ( $_POST["url"] ) {
        $doc = new DOMDocument();
        @$doc->loadHTML( file_get_contents( $_POST["url"] ) );
        $xpt = new DOMXPath( $doc );
        $output = $xpt->query("//title")->item(0)->nodeValue;
    } else {
        $output = "URL not provided";
    }
   echo $output;
?>

You can get the favicon using:

<?php 
    $url = $_POST['url'];
    $doc = new DOMDocument();
    $doc->strictErrorChecking = FALSE;
    $doc->loadHTML(file_get_contents($url));
    $xml = simplexml_import_dom($doc);
    $arr = $xml->xpath('//link[@rel="shortcut icon"]');
    echo $arr[0]['href'];
?>

Finally for the description you can use:

<?php
    $tags = get_meta_tags($_POST['url']);
    $description = $tags['description'];
    echo $description;
?>
Community
  • 1
  • 1
SufferMyJoy
  • 304
  • 1
  • 6
0

There are very smart scripts/classes out there that help getting content from the dom. For instance using smart selectors. I recommend using one of those.

This is a nice example: http://simplehtmldom.sourceforge.net/

To get the content of the page, use file_get_contents or equal function.

Björn3
  • 297
  • 1
  • 2
  • 8
-1

You can use file_get_contents() function to get the favicon for a site(unless it thwarts you for https). Example:

$icon = file_get_contents("http://stackoverflow.com/favicon.ico");
// now save it

Another option is using curl. It's an awesome php extension if you know how to use it.

Using these methods, you can fetch the html content from the sites too. And then can parse them any HTML parser library of PHP. Or can use REGEX(which experts doesn't recommend often).

Sabuj Hassan
  • 38,281
  • 14
  • 75
  • 85