0

I'm running into an issue with cURL while getting customer review data from Google (without API). Before my cURL request was working just fine, but it seems Google now redirects all requests to a cookie consent page.

Below you'll find my current code:

$ch = curl_init('https://www.google.com/maps?cid=4493464801819550785');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);

print_r($result);

$result now just prints "302 Moved. The document had moved here."

I also tried setting curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0); but that didn't help either.

Does anyone has an idea on how to overcome this? Can I programmatically deny (or accept) Google's cookies somehow? Or maybe there is a better way of handling this?

DeltaG
  • 106
  • 9
  • Maybe [this question](https://stackoverflow.com/questions/43886124/api-to-get-all-the-reviews-and-rating-from-google-for-business) can help you about using Google Business API. Creating your own scrapper will in most cases always be a pain in the ass. I also tried it with Facebook and Instagram every time they change something small and the scrapper doesn't work anymore – Baracuda078 Aug 01 '22 at 13:28
  • I'm not using the Google Business API for this, or any other API whatsoever. – DeltaG Aug 01 '22 at 13:30
  • I know but using there API can save you allot of trouble – Baracuda078 Aug 01 '22 at 13:31
  • I know using their API would make things a little easier, but as per requirements I can't use the API. Clients just need to enter their CID number and my script should handle the rest (getting the reviews). – DeltaG Aug 01 '22 at 13:35

1 Answers1

1

What you need is the following:

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);

The above curl option is what tells curl to follow redirects. However, I am not sure whether what is returned will be of much use for the specific URL you are trying to fetch. By adding the above option you will obtain the HTML source for the final page Google redirects to. But this page contains scripts that when executed load the map and other content that is ultimately displayed in your browser. So if you need to fetch data from what is subsequently loaded by JavaScript, then you will not find it in the returned results. Instead you should look into using a tool like selenium with PHP (you might take a look at this post).

Booboo
  • 38,656
  • 3
  • 37
  • 60
  • Yes, ofcourse CURLOPT_FOLLOWLOCATION needed to be set to true. I completely misunderstood this option. Thanks for your clarification. No need to fetch data after JavaScript. This was just what I needed. Thanks again! I will award your bounty tomorrow :) – DeltaG Aug 04 '22 at 12:31
  • In that case, please *accept* the answer. – Booboo Aug 04 '22 at 12:42