3

Whenever I run the url https://scholar.google.com/citations?user=N7m4vIQAAAAJ&hl=en in private windows of Safari and Google Chrome, Google gives an errors.

It happens only on the first request with private browsing mode.

Anybody knows why this happens only in specific environment? This has been happening since 3 days ago.

-- an error message and a capture

Server Error We're sorry but it appears that there has been an internal server error while processing your request. Our engineers have been notified and are working to resolve the issue. Please try again later.

enter image description here

enter image description here

--- added

The header file includes

http header response Cache-Control: no-cache, must-revalidate Content-Encoding: gzip Content-Type: text/html; charset=UTF-8 Date: Mon, 16 Nov 2015 19:35:39 GMT Expires: Fri, 01 Jan 1990 00:00:00 GMT Pragma: no-cache Server: citations Set-Cookie: NID=73=eF98qod1NpYg7nb03RUToiSiacFgqNoZxQ4CuzqwGlQn53SoR7rHlzO0OExsmYkpRazROCQ3WqKoCsWKFPxp8dZr5pBra6nD1HPcxWUILl9gVAf5Q7GSQc3B0O3TP4gu; expires=Tue, 17-May-2016 19:35:39 GMT; path=/; domain=.google.com; HttpOnly X-Firefox-Spdy: h2 X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block p3p: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info." x-content-type-options: no sniff

Karl
  • 329
  • 6
  • 20
  • 1
    this seems to also happen when pulling content from the page with R. `library(httr)` `res=GET('https://scholar.google.com/citations?hl=en&user=qZLGnroAAAAJ')` `content(res)` `res2=GET('https://scholar.google.com/citations?hl=en&user=qZLGnroAAAAJ')` `content(res2)` This bug was reported in the github of the R `scholar` package (see https://github.com/jkeirstead/scholar/issues/23). – Bastien Nov 17 '15 at 14:38
  • @Bastien Thanks, I read your report in the GitHub. As you know, I can see the content that I want to retrieve, when I request it twice in the same tab. Similarly, we also use a script to retrieve data. It doesn't work for our script. Twice calls isn't the best answer as well. Do you have any other idea? – Karl Nov 17 '15 at 16:32
  • Yeah at the moment I have no idea. Seems to be a recent issue from google servers. But my expertise is super limited so the only suggestion would be to call twice. Indeed, it's ugly but hopefully the first call doesn't actually take much time. If I hear more (i.e., if together with the devs or the package I reported the issue we find a solution) I'll post it here. – Bastien Nov 17 '15 at 16:57
  • @Bastien We figured it out and will close this issue. I will share little more details in your GitHub issue track. – Karl Nov 17 '15 at 23:42
  • cool! Yeah I'm sure it'll be useful to share it. – Bastien Nov 18 '15 at 00:11
  • Will add that it's, obviously, also happening when pulling content with a simple php html dom parser. As the solution is for R, any suggestions on double requests with php? – timmyg Nov 18 '15 at 17:49
  • For the record, this was fixed in the `scholar` package with version 0.1.4. – jkeirstead Nov 21 '15 at 17:52

1 Answers1

1

Fixed the issue by having cookies when it requests URLs. REF: PHP cURL how to add the User Agent value OR overcome the Servers blocking cURL requests?

I use php scripts to retrieve and put some cookie options.

A code snippet is

    $curl = curl_init($url);
    $dir                   = dirname(__FILE__);
    $config['cookie_file'] = $dir . '/cookies/' . md5($_SERVER['REMOTE_ADDR']) . '.txt';

    curl_setopt($curl, CURLOPT_COOKIEFILE, $config['cookie_file']);
    curl_setopt($curl, CURLOPT_COOKIEJAR, $config['cookie_file']);

    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

    $data = curl_exec($curl);
    curl_close($curl);
Community
  • 1
  • 1
Karl
  • 329
  • 6
  • 20
  • Hi, I tried to use your script but it keeps giving me an empty $data, did you add anything else in your curl_setopt? Something I could have missed? – user1868607 Nov 23 '15 at 14:48
  • @user1868607 No, this is all that I use. can you tell what the value of the $url? – Karl Nov 23 '15 at 17:24
  • i don't know if this can help someone, but I solved it adding curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); – user1868607 Nov 23 '15 at 22:17