1

I want to get the source of some website, and search through it to find a string. I did something like this:

$source = file_get_contents('http://website.com');

preg_match('/foobar/', $source, $match);

var_dump($match);

The source contains the expression I look for, dumping $source variable proves that. But the result is an empty array.

The thing is that everything works, and the result is correct, when I copy the source, and paste it like this:

$source = <<<EOF
   // paste here
EOF;

preg_match('/foobar/', $source, $match);

var_dump($match);

Now it works perfectly.

What is wrong, why it happens? Thanks!

khernik
  • 2,059
  • 2
  • 26
  • 51
  • Have you tried `var_dump($sourceFromUrl === $sourceFromPasting);` to check if your input is identical? That should be your first point to check – scrowler Apr 23 '15 at 20:46
  • Double check the string you are getting back from the file_get_contents (`echo htmlentities($source);`) is what you expect. It is possible the site blocks php from downloading it's source (user agent matching, checking session) or the source you see is generated from javascript and not accessible for file_get_contents. – Jonathan Kuhn Apr 23 '15 at 20:50

1 Answers1

0

Do more debugging.

Print out $source after getting it through file_get_contents(), what does it give you?

Most likely you are unable to fetch the data from website which could be the reason why it fails to match the regular expressing for you.

There may be several reasons, for example the page you are trying to fetch uses a redirection or https, and file_get_contents() is not that advanced to detect it and fetch it.


Possible solution could be to use cURL to replace file_get_contents() function, it provides more functionality to fetch even https.

Something similar has been solved here.

Community
  • 1
  • 1
FanaticD
  • 1,416
  • 4
  • 20
  • 36