10

I need to get the HTML source of pinnaclesports.com. The problem is it detects whether cookies and JS are enabled and if not, it just returns some page saying

This site requires JavaScript and Cookies to be enabled. Please change your browser settings or upgrade your browser.

Is there any way how to spoof JS support when using cURL?

EDIT: I can use a headless browser that runs either as a Perl/Ruby module or is written in PHP

user965748
  • 2,227
  • 4
  • 22
  • 30
  • You would need a *headless browser* for that; http://stackoverflow.com/questions/125177/whats-a-good-tool-to-screen-scrape-with-javascript-support – Alex K. Sep 06 '12 at 15:16
  • Don't you know a simple one that would be written as a PHP library that runs on PHP 5.2? – user965748 Sep 06 '12 at 16:16
  • Barebones lookes hopeful, unfortunately it doesn't seem to solve the JS problem. – user965748 Sep 06 '12 at 18:14
  • simply set a header to your curl request, user agent and such – Ibu Sep 06 '12 at 18:30
  • @Ibu: You mean header('Location:...? Could you be more specific? – user965748 Sep 06 '12 at 18:34
  • @user965748 [Here is an example](http://davidwalsh.name/set-user-agent-php-curl-spoof) – Ibu Sep 06 '12 at 18:46
  • Ofc I had that set, I also tried Googlebot – user965748 Sep 06 '12 at 19:04
  • Setting curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246' to reference google chrome's user agent string worked for me ref @Ibu comment. Thank you. – TLG123 Jul 24 '23 at 11:35

2 Answers2

7

Other sugestion is set the user agent, this solution works for me on parser of the Google Groups:

curl -L -v "https://groups.google.com/d/forum/<GROUP-NAME>" -A "Mozilla/5.0 (compatible;  MSIE 7.01; Windows NT 5.0)"
João Paulo Cercal
  • 733
  • 1
  • 8
  • 12
3

I figured out that, if you make cookie-less REQUEST a page will be returned , which uses javascript to set cookies, the one which you are getting using the curl.

make another curl call like this

curl https://www.pinnaclesports.com/ --cookie "YPF8827340282Jdskjhfiw_928937459182JAX666=122.167.231.139"

i.e. You have to make 2 calls 1) make cookie less call, read and regex to find cookiename. 2) make 2nd request after setting the cokie name. that will solve your problem.

OR
Just use YQL

select * from html where url="https://www.pinnaclesports.com/" 

point your curl to here

Markandey Singh
  • 449
  • 3
  • 9
  • Thank you! The method you described works. YQL solution might be useful as well, but I need to further work with the source for making a login request, so it's probably better to use the former way. – user965748 Sep 06 '12 at 21:21
  • 2
    I am in same kind of dilemma. I read your solution up there but don't know how to find the cookie name and how to use it in the second curl request. Any assistance in this regard would be highly appreciated. – Saad Bashir Aug 14 '13 at 05:18