Getting data from an external webpage

Question

What's the best way to get content from an external website via php?

Using php how do I go to webpage (ex: http://store.domain.com/1/) and scan the HTML coding for data that is found in between (which is the letter C and E). what php method do I use?

<span id="ctl00_ContentPlaceHolder1_phstats1_pname">C</span>
<span id="ctl00_ContentPlaceHolder1_phstats2_pname">E</span>

then save "C" (the found string) to $pname

$_session['pname1'] = $pname1;
$_session['pname2'] = $pname2;

It's called screen scraping, and has been asked/answered many times on this site before: http://stackoverflow.com/questions/519920/screen-scraping-technique-using-php — Marc B, Nov 04 '11 at 04:32

score 4 · Answer 1 · edited May 23 '17 at 12:13

4

You need to use web page scraping technique. It can be done simply by using HTML DOM Library or with technologies like Node.js and jQuery. You can find some useful tutorials regarding this here and here.

You may also see this thread regarding implementing scraping using PHP

edited May 23 '17 at 12:13

Community

1
1

answered Nov 04 '11 at 04:43

nbk

1,992
2
19
34

score 3 · Accepted Answer · answered Nov 04 '11 at 04:52

3

The most efficient method is:

$content = file_get_contents('http://www.domain.com/whatever.html');

$pos = str_pos($content,'id="c');
$on=0;
while($pos!==false)
 {
 $content = substr($content,$pos+4);
 $pos = str_pos($content,'"');
 $list[$on] = substr($content,0,$pos);
 $on++;
 $pos = str_pos($content,'id="c');
 }

Then all yours values will be in the $list array, the count of which is $on.

You could also do it in one line with one of the preg functions, but I like the old-school method, it's a nanosecond faster.

answered Nov 04 '11 at 04:52

Alasdair

13,348
18
82
138

should it be $pos = str_pos($content,''); ? what would be the best way to do it separately instead of grouping all in a list array – acctman Nov 04 '11 at 06:09
No, it should be as it is, otherwise you will only get 1 result. – Alasdair Nov 04 '11 at 07:00
It's best to put them into an array and then just process them individually after, using for($run=0; $run<$on; $run++), and inside that loop $list[$run] will contain each ID. – Alasdair Nov 04 '11 at 07:01

score 0 · Answer 3 · answered Nov 04 '11 at 04:36

i think you can actually use file_get_contents("http://store.domain.com/1/"); to do an http request.

as far as parsing it, depending on how big your project is and how much effort you're willing to go, you can find an html DOM parser like here http://simplehtmldom.sourceforge.net/ or simply search for id="ctl00_ContentPlaceHolder1_phstats1_pname" and take it apart piece by piece (not the recommended way of doing things).

score 0 · Answer 4 · answered Nov 04 '11 at 04:55

0

It can be done by CURL. But you can just include the Simple HTML DOM Parser in your project. Its very easy to use and will serve your purpose.

The documentation is here. http://simplehtmldom.sourceforge.net/

answered Nov 04 '11 at 04:55

Ghost-Man

2,179
1
21
25

Getting data from an external webpage

4 Answers4