How do I get text from a website using PHP?

Question

So, I'm working on a PHP script, and part of it needs to be able to query a website, then get text from it.

First off, I need to be able to query a certain website URL, then I need to be able to get text from the text from that website after the query, and be able to return that text out of the function.

How would I query the website and get the text from it?

score 14 · Answer 1 · edited May 23 '17 at 12:16

14

The easiest way:

file_get_contents()

That will get you the source of the web page.

You probably want something a bit more complete though, so look into cURL, for better error handling, and setting user-agent, and what not.

From there, if you want the text only, you are going to have to parse the page. For that, see: How do you parse and process HTML/XML in PHP?

edited May 23 '17 at 12:16

Community

1
1

answered Jul 18 '11 at 03:47

Brad

159,648
54
349
530

Erick Martinez · Answer 2 · 2011-07-18T04:11:28.930

9

I would do a dom search, take a look at http://www.php.net/manual/es/domdocument.load.php Domxpath might be very useful too: http://php.net/manual/en/class.domxpath.php

$doc = new DOMDocument;
$doc->load("http://mysite.com");
$xpath = new DOMXpath($doc);
$elements = $xpath->query("*/div[@id='yourTagIdHere']");

edited Jul 18 '11 at 04:11

answered Jul 18 '11 at 03:57

Erick Martinez

805
1
9
11

score 0 · Answer 3 · answered Aug 07 '15 at 16:55

Can this be done by getting all of the content from the webpage utilizing methods already listed above, and then using regex to remove all characters between open and closed brackets?

A page that looks like this:

<html><style> h1 { font-style:... }</style><h1>stuff in here</h1></html>

Would then become this after regex:

h1 { font-style:... }stuff in here

And because we want to remove all of the code in between various tags such as the [style] tag, we could then first use regex to remove all characters between [style and /style] so that we are just left with:

stuff in here

Would this work then? Please reply if you think it would or if you foresee errors as I would like to create a tool with this parsing.

score 0 · Answer 4 · answered Jul 18 '11 at 03:47

You can use file_get_contents or if you need a little more control (i.e. to submit POST requests, to set the user agent string, ...) you may want to look at cURL.

file_get_contents Example:

$content = file_get_contents('http://www.example.org');

Basic cURL Example:

$ch = curl_init('http://www.example.org');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3');

$content = curl_exec($ch);

curl_close($ch);

score 0 · Answer 5 · answered Jul 18 '11 at 03:48

0

If you have Curl installed, use it. Otherwise:

$website = file_get_contents('http://google.com');

Then you need to search through the string for the text you want. How you do that depends on the website, and the text you're trying to read.

answered Jul 18 '11 at 03:48

Paul

139,544
27
275
264

score 0 · Answer 6 · answered Jul 18 '11 at 03:48

0

you need to use CURL. You can get some samples here

answered Jul 18 '11 at 03:48

TheTechGuy

16,560
16
115
136

score 0 · Answer 7 · answered Jul 18 '11 at 03:51

0

If you want more control, use cURL. Otherwise: file_get_contents..

$url  = "http://www.example.com/test.php";  // Site URL.
$site = file_get_contents($url);             // Gets site response.

answered Jul 18 '11 at 03:51

Mingle

846
6
13

How do I get text from a website using PHP?

7 Answers7

Linked