1

I'm developing a social media platform using a LAMP build. So far my users can upload pictures and videos and comment and vote on them. I want users to be able to post a url link to an article, and have the title, image and description automatically pop up like it does on Facebook. I'm guessing that most web pages containing articles include some sort of meta data that would allow a developer like me to systematically access the title, description fields etc. If this is the case, then how specifically do I access this metadata. Otherwise, how does Facebook do it?

Thanks,

cmd
  • 11,622
  • 7
  • 51
  • 61
Theramax
  • 195
  • 1
  • 13
  • Facebook most likely uses an HTML parser to read the data like a browser would. With that, they can extract the data from the page and format to their choosing. – Axel Oct 07 '13 at 17:19
  • Not an answer, but fb read the page and check if a thumb link exist: `code``code` – NVRM Oct 07 '13 at 17:22

2 Answers2

1

You can use a PHP HTML parsing library that allows you to input a URL, and break out meta information at your choosing.

This answer on StackOverflow has an excellent list of available HTML parsing options for PHP: https://stackoverflow.com/a/3577662/1332068

Community
  • 1
  • 1
Axel
  • 10,732
  • 2
  • 30
  • 43
  • Thanks for the guidance! I used the native PHP DOM library like so: 'code'loadHTMLFile("http://news.sciencemag.org/plants-animals/2013/10/scienceshot-bird-flies-10000-kilometers-without-stopping?rss=1"); $titles = $article->getElementsByTagName("title"); foreach($titles as $title){ echo $title->nodeValue, PHP_EOL; } ?> 'code' – Theramax Oct 08 '13 at 20:30
0

This scrapes all of the images off of whatever valid url you input:

<?php
if(isset($_POST['link'])){
    $link = $_POST['link'];
    $scrapings = "";
    $article = new DOMDocument;
    $article ->loadHTMLFile($link);
    $titles = $article->getElementsByTagName("title");
    foreach($titles as $title){
        echo $title->nodeValue, PHP_EOL;
    }
    $images = $article->getElementsByTagName("img");
    foreach($images as $image){
        $source = $image->getAttribute("src");
        $scrapings .= '<img src="'.$source.'" alt="default">';
    }
}
?>
<!DOCTYPE html>
<html>
    <head></head>
    <body>
        <form method="POST" action="article_system.php">
            <input type="text" name="link">
            <input type="submit" value="submit">
        </form>
        <?php echo $scrapings; ?>
    </body>
</html>
Theramax
  • 195
  • 1
  • 13