Get contents of article given URL

Question

Given a page's contents (its HTML), how could I get the contents of the article?

For example, this website returns the contents of articles given a URL:

http://embed.ly/docs/explore/extract?url=http%3A%2F%2Fwww.foxnews.com%2Fsports%2F2016%2F08%2F14%2Fryan-lochte-3-other-u-s-swimmers-robbed-in-brazil.html

However, I don't want to use their API. I've used file_get_contents($url), but I have no idea how I would go about getting the contents of just the article.

Any ideas?

You're going to have to parse the output of `file_get_contents($url)` and keep the part you are interested on. — Vicente Olivert Riera, Aug 14 '16 at 19:32
What about regex or the substr, strstr, strpos, .... functions — Orry, Aug 14 '16 at 19:32
Possible duplicate of [How do you parse and process HTML/XML in PHP?](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) — chris85, Aug 14 '16 at 19:34
^ even finding a possible dupe is already doing "their" homework/work. — Funk Forty Niner, Aug 14 '16 at 19:34
@VicenteOlivertRiera There's no way embed.ly does this for every site. No matter what URL you enter, it will always return the correct content, even if it's some no-name blog or some incredibly tiny news station that no one's heard of. — user6715530, Aug 14 '16 at 19:47

owais · Answer 1 · 2016-08-14T19:48:37.870

3

$url = 'http://www.foxnews.com/sports/2016/08/14/ryan-lochte-3-other-u-s-swimmers-robbed-in-brazil.html';
$content = file_get_contents($url);
$first_step = explode( '<div class="article-text">' , $content );
$paras = explode("<p>" , $first_step[1] );

foreach($paras as $para ) {
   echo $para;
}

here if you want to get contents with image also use article tag as used in their dom structure.

edited Aug 14 '16 at 19:48

answered Aug 14 '16 at 19:39

owais

4,752
5
31
41

1

Hope `article-text` never has a `div` inside it. – chris85 Aug 14 '16 at 19:42

Get contents of article given URL

1 Answers1