1

I need to load content from a remote uri into a PHP variable locally. The remote page only shows content when JavaScript is turned on. How can I get around this?

Essentially, how can I use cURL for pages requiring JavaScript loaded content?

  • How can that be possible? I don't think they can actually check for that... – Mihai Iorga Aug 21 '12 at 14:03
  • It is possible if you have an AJAX call or something to report to the server. In that way, the page could hide contents until fetched by the 2nd request. This is often done to prevent scraping though. – Daniel Aug 21 '12 at 14:07
  • 1
    Do you think maybe they don't want you scraping the site? – ceejayoz Aug 21 '12 at 14:07

2 Answers2

3

Mink was the only php headless browswer that I could find. As noted selenium is another popular choice. I don't know how good of performance these will offer though if you have a lot of scraping to do. They seem to be more geared towards testing?

A number of other languages have them which are listed in the link below. Since php does does not process javascript you will need another tool. Headless browswers expose the javascript engine and allow you to interact with the browser programattically.

headless internet browser?

Community
  • 1
  • 1
dm03514
  • 54,664
  • 18
  • 108
  • 145
1

To do this you have to emulate a browser using a browser plugin such as selenium. This will involve slightly more than just a simple get request though.

http://seleniumhq.org/

sean
  • 3,955
  • 21
  • 28