0

What I would like to do: get the text headline from the top post on http://reddit.com/r/worldnews and output it to a webpage of mine that will only have that text on it.

In the end, I would like to grab the text from that webpage that I made using AppleScript cURL and output it.

I am making a script that when I click the button it will tell me the top post.

edit If you can think about any way, I would like to do the same thing, but for Facebook notifications.

edit I have PHP grabbing the site and outputting here: http://colejohnsoncreative.com/personal/ai/worldnews.php This is the code that I am using:

    <?php
// Get a file into an array.  In this example we'll go through HTTP to get
// the HTML source of a URL.
$lines = file('http://www.reddit.com/r/worldnews');

// Loop through our array, show HTML source as HTML source; and line numbers too.
foreach ($lines as $line_num => $line) {
    echo "Line #<b>{$line_num}</b> : " . htmlspecialchars($line) . "<br />\n";
}

// Another example, let's get a web page into a string.  See also file_get_contents().
$html = implode('', file('http://www.example.com/'));

// Using the optional flags parameter since PHP 5
$trimmed = file('somefile.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
?>

So I get all of the site's code to output, but all I need for the project is

<a class="title " href="http://www.dailymail.co.uk/news/article-2219477/Cannabis-factory-couple-gave-400-000-drug-dealing-fortune-poor-Kenyans-jailed-years.html" >British couple who spent most of the money they made from canabis growing on paying for life changing operations and schooling for people in a poor Kenyan village gets sent to prison for 3 years.</a>

and everything else I need to throw away, how can I do that?

Cole
  • 31
  • 1
  • 7
  • 1
    Take a look at SCRAPPING methods http://stackoverflow.com/questions/26947/how-to-implement-a-web-scraper-in-php – Steven Oct 19 '12 at 01:59

2 Answers2

0

If youre in a shell you can wget the page

From php you could file_get_contents the page

From java you could get it with URLConnection

Once you have it, use what ever language you want to look through the text of the page for what you want, and do whatever you like with it

case1352
  • 1,126
  • 1
  • 13
  • 22
0

You gonna have to do some parsing. So match the pattern you want. Simplest is to do something like str_pos to get the position of the elements around what you want or use regex. Do they have a RSS feed? If so you should use that.

xelber
  • 4,197
  • 3
  • 25
  • 33