0

Do you know if it is possible to force the robots crawl on www.domaine.com and not domaine.com ? In my case, I have a web app that has enabled cached urls with prerender.io (to view the HTML code), but only on www.

So, when the robots crawl on domaine.com, it has no data.

The redirection is automatic (domaine.com> http://www.domaine.com) on Nginx, but no results.

I said that my on my sitemap, urls have all www.

My Nginx redirect :

server {
  listen                *:80;

  server_name           stephane-richin.fr;

  location / {

    if ($http_host ~ "^([^\.]+)\.([^\.]+)$"){
      rewrite ^/(.*) http://www.stephane-richin.fr/$1 redirect;
    }

  }
}

Do you have an idea ?

Thank you !

Pekka
  • 442,112
  • 142
  • 972
  • 1,088
Stéphane R.
  • 1,386
  • 3
  • 19
  • 37
  • Do you have any evidence that search engine crawlers are currently indexing `domaine.com` but not `www.domaine.com`? Have you checked with a tool like Google Webmasters: https://www.google.com/webmasters/#?modal_active=none ? – Pekka Sep 21 '16 at 09:39
  • Yep, i use prerender for check : https://box.everhelper.me/attachment/584507/9694636d-053f-40da-bc73-ee2a0df9ef86/321375-GAbfF2KpkO3gqtCd/screen.png – Stéphane R. Sep 21 '16 at 09:43
  • It’s unlikely to be possible to force the crawler to do anything... are you sure you are recording `www.` hits in that tool? Perhaps they’re not recorded because you’re returning a 404? Have you submitted the sitemap in the Google Webmasters console? – Pekka Sep 21 '16 at 09:58
  • Yes, there is 1 week. On my robots.txt, i've just this : http://www.stephane-richin.fr/robots.txt – Stéphane R. Sep 21 '16 at 09:59
  • To be clear: you submitted the sitemap to Google 1 week ago, and the crawler was back today on the wrong domain? – Pekka Sep 21 '16 at 10:00
  • Yep i've submitted my sitemap on Google there is 1 week ago, but the crawler have alway crawl without domaine, i don't know why :/ – Stéphane R. Sep 21 '16 at 10:02
  • Hmm, that’s strange. What code are you sending in the redirect, 301 or 302? – Pekka Sep 21 '16 at 10:03
  • I see, a 302. Might be worth sending a 301 instead http://stackoverflow.com/questions/1393280/http-redirect-301-permanent-vs-302-temporary – Pekka Sep 21 '16 at 10:04
  • In my config, i'm not see code redirect (i've edit my first post) – Stéphane R. Sep 21 '16 at 10:05
  • I’m not familiar with nginx, but try `permanent` instead of `redirect` ([source](https://www.nginx.com/blog/creating-nginx-rewrite-rules/)) – Pekka Sep 21 '16 at 10:07
  • If you need a check whether the change worked, ping me and I’ll check (or you can do it yourself on a *nix terminal using `curl -X HEAD -i http://stephane-richin.fr` – Pekka Sep 21 '16 at 10:08
  • @pekka I've test and now i see "HTTP/1.1 301 Moved Permanently". So the bots will be redirect to for crawl www. ? – Stéphane R. Sep 21 '16 at 10:45
  • Yup, I can confirm it’s sending a 301 now. I can’t tell you for sure whether it’ll help - you’ll have to wait for the next bot visit/s - but this definitely was a thing that needed fixing. Also make sure you check out all the features in the webmaster console to see whether there’s anything else you can do – Pekka Sep 21 '16 at 10:47
  • 1
    Okay, thank you for your help ! I will give news to tell if this change :) – Stéphane R. Sep 21 '16 at 10:49
  • So :) The crawl is alway on stephane-richin.fr (without www.) dans code returned is 301 – Stéphane R. Sep 21 '16 at 13:57
  • It might be worth waiting for the next one! It should now know that it’s supposed to look at the new address. It’s still weird, it should be checking the www. ones right away. Nothing in the webmaster console? – Pekka Sep 21 '16 at 14:05

2 Answers2

0

Could you have a robots.txt file with

User-agent: *
Disallow: /

on domaine.com and a different one with

User-agent: *
Disallow:

on www.domaine.com?

Julien Nioche
  • 4,772
  • 1
  • 22
  • 28
0

If you submitted a sitemap with the correct URLs a week ago, it seems strange that the Google keeps requesting the old ones.

Anyway - you’re sending the wrong status code in your non-www to www redirect. You are sending a 302 but should be sending a 301. Philippe explains the difference in this answer:

Status 301 means that the resource (page) is moved permanently to a new location. The client/browser should not attempt to request the original location but use the new location from now on.

Status 302 means that the resource is temporarily located somewhere else, and the client/browser should continue requesting the original url.

Community
  • 1
  • 1
Pekka
  • 442,112
  • 142
  • 972
  • 1,088