1

I recently installed an SSL cert on one of my sites. I have noticed that Google has now indexed both the http and https version of each page. I haven't really noticed any problem ranking wise so far, but I am conscious that a problem may occur with duplicate content.

How can I overcome this? Only a few of my pages will be using https, most of the pages on the site will be best served with just http, in fact I could get away with not using https pages at all for the time being if necessary.

A few ideas I have come across are: 301 redirects, redirect all https to http with .htaccess.

Robots.txt for the ssl pages , again using .htaccess. The problem here is that the https pages have already been indexed and I would like them to be deindexed. I am not sure if robots.txt would be sufficient because as far as I am aware robots.txt will just tell a bot not to crawl the page, but it has already been indexed.

Is there any other suggestions?

Ben C
  • 395
  • 3
  • 8
Henrick
  • 170
  • 1
  • 1
  • 9
  • Robots.txt tells a bot to both not crawl as well as not index a page. (not saying it's the solution you want, but it would de-index the pages) – Ben C Jul 31 '13 at 15:52

2 Answers2

4

Use canonical URLs for this.

John Conde
  • 217,595
  • 99
  • 455
  • 496
  • Any reason you would use canonical urls over 301 or robots? – Henrick Apr 23 '12 at 15:30
  • Robots.txt isn't suited to handle this and you don't want to block pages, just one version of them. 301 redirects would work but is more error prone the canonical URLs. – John Conde Apr 23 '12 at 15:34
  • Thanks John, not disagreeing with you but just on the robots.txt issue. A few items I read suggested using .htaccess to change the robots .txt file depending on what request is being served. So the following in .htaccess --- .htaccessRewriteCond %{SERVER_PORT} 443 [NC]RewriteRule ^robots.txt$ robots_ssl.txt [L] – Henrick Apr 23 '12 at 15:39
  • That's a very hackish way to do it. Especially when there are cleaner alternatives available. – John Conde Apr 23 '12 at 15:41
  • Thanks again John , one last question if I may, why would 301 redirects be more error prone to canonical urls? – Henrick Apr 23 '12 at 15:43
  • It requires sending out HTTP headers and the browser/crawler interpreting them properly. It essentially has moving parts. Canonical URLs do not. The search engine reads the page normally and finds out what it needs to know and it has no effect on users in any way. – John Conde Apr 23 '12 at 15:45
0

As i have already faced this problem ,Good solution will be cannonical link Google will remove your indexd https page after some time (takes week to month ).For those page where you can't put cannonical link give 301 redirect from https to http .