0

I've been trying to web scrape https://www.worldpadeltour.com/en/tournaments/cupra-vigo-open-2020/2020/?tab=results but with normal scraping or with cUrl I end up on the tournament General info tab. Where the Results tab is not loaded into curl, but in a browser, I come to the result tab with the above url.

I've been trying with all these methods with the same result, I have also looked for XHR requests, but I haven't located any. What I want in my executed curl variable is for example <span>6-0 / 7-6(5) </span>. I've tried the below options with no avail.

CURLOPT_USERAGENT (with Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36)  
CURLOPT_FOLLOWLOCATION
CURLOPT_COOKIESESSION, TRUE   
CURLOPT_POSTFIELDS, 'tab=results'   
CURLOPT_RETURNTRANSFER  
CURLOPT_SSL_VERIFYHOST, FALSE);   
CURLOPT_SSL_VERIFYPEER`

What I essentially want are the scores in that tab to my variable, so I can scrape names and results. I get the same results as if I'd just use file_get_contents

nib
  • 11
  • 1
  • the `tab=results` is not POST data - it forms part of the url/querystring – Professor Abronsius Dec 16 '20 at 13:26
  • I tried both, but I end up on the 'General information' tab with curl, as in having the full url including tab=results in the url – nib Dec 16 '20 at 13:30
  • 3
    The content looks to be generated using javascript. Try your scrape with either node.js or a headless browser like casper.js etc – Professor Abronsius Dec 16 '20 at 13:33
  • @ProfessorAbronsius thank you. Do you have any small example on that? – nib Dec 16 '20 at 13:39
  • What have you tried so far? Where are you stuck? Usually, it's easier to as for an API, as this would also imply that the provider allows to gather such data – Nico Haase Dec 16 '20 at 13:41
  • I execute the curl and I end up with what ever you end up with scraping https://www.worldpadeltour.com/en/tournaments/cupra-vigo-open-2020/2020, but I need the results tab – nib Dec 16 '20 at 13:44
  • In your browser, open the developer tools and look for XHR requests, those are the ones that you probably want to simulate. Try to match the headers as much as possible, especially referrer – Chris Haas Dec 16 '20 at 14:18
  • @ChrisHaas, thanks for the tip, but I end up with 0 requests. What am I missing? – nib Dec 16 '20 at 15:02
  • **[You should not switch off `CURLOPT_SSL_VERIFYHOST` or `CURLOPT_SSL_VERIFYPEER`](https://paragonie.com/blog/2017/10/certainty-automated-cacert-pem-management-for-php-software)**. It could be a security risk! [Here is how to get the certificate bundle if your server is missing one](https://stackoverflow.com/a/32095378/1839439) – Dharman Dec 16 '20 at 23:34

1 Answers1

0

You did miss one XHR. The following gets your desired output.

$ch = curl_init('https://www.worldpadeltour.com/info-torneos/cupra-vigo-open-2020/2020/');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'lang=en&selected_tab=results&section_data=');
curl_setopt($ch, CURLOPT_HTTPHEADER, ['x-requested-with: XMLHttpRequest']);
curl_exec($ch);
curl_close($ch);

The header is required, it won't output without it.

SeaWorld
  • 352
  • 3
  • 11