Curl scraping of Google doesn't work in production

Question

I am working on a little project with Curl and PHP to scrape the results of Google Scholar. It works fine in my development mode but when I try in a production mode something is not working and there is no result...

Here is my code:

// SCRAPING GOOGLE SCHOLAR
    if (isset($_POST['google'])){
        $googleURL = 'http://scholar.google.com/scholar?hl=fr&q=' . $url_subject;

        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $googleURL);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
        curl_setopt($ch, CURLOPT_USERAGENT, $random->random_user_agent());
        $result = curl_exec ($ch);
        curl_close($ch);

        $html = $this->container->get('simple_html_dom');
        $html->load($result);

Thank you for your help

check your log files for errors. curl might not be installed — Nitsan Baleli, Feb 01 '15 at 12:15
It is not that, there is nothing into the log file regarding Curl — nico_lrx, Feb 01 '15 at 12:44
Perhaps this issue is caused by either `safe_mode` or `open_basedir` option turned on. See [here](http://stackoverflow.com/questions/2511410/curl-follow-location-error) for details — hindmost, Feb 01 '15 at 18:46
Thank you, but maybe it is because Google Scholar detects my serveur's URL? — nico_lrx, Feb 02 '15 at 20:07

score 0 · Answer 1 · answered Oct 06 '16 at 18:56

Google Scholar frowns on scraping their content. This is against their terms of service. Command line curl is helpful for troubleshooting this kind of thing:

$ curl -vv https://scholar.google.com/scholar?hl=en&q=neurotransmitters
> GET /scholar?hl=en HTTP/1.1
> User-Agent: curl/7.35.0
> Host: scholar.google.com
> Accept: */*
> 
< HTTP/1.1 403 Forbidden
...
<html>...<title>Sorry...</title></head><body>
<h1>We're sorry...</h1>
<p>... but your computer or network may be sending automated queries.
To protect our users, we can't process your request right now.</p>
<div style="margin-left: 4em;">See
<a href="https://support.google.com/websearch/answer/86640">Google Help</a>
for more information.</div>
</body></html>

Curl scraping of Google doesn't work in production

1 Answers1