1

I want google to stop craw/index duplicate and non existing page in my website.

Google index page by auto creating url parameters from my site which then makes no sense/non existing page and some of them makes duplicate content.

Example:

Google index these type of url which do not exists

http://www.example.com/url-pr1/url-pr2/?keyword=url-pr1&url-pr3=url-pr4

Google index these type of url which makes duplicate content

http://www.example.com/page.php?link=url-pr1&url-pr2=url-pr4
//duplicate for page like http://www.example.com/url-pr1/url-pr4/

I have added ulr parameters in webmaster tools as No-Urls but still google keeps indexing these kind of url.

How can I tell google that these pages do not exist or have duplicate content and to not index pages by auto creating url parameters.

Shall I use redirect to 404 page for url parameters that do not make sense and if so how to do it using htaccess.

Please see and suggest any possible way to do it.

Thanks.

4sha
  • 326
  • 1
  • 3
  • 12

1 Answers1

0

If the URLs are actually invalid, you should return a 404 response, which should prevent Google from indexing the page.

If the URL leads to duplicate content, then you should make sure the page has a canonical URL on it. That will help Google to recognize that it is a duplicate

Eric Petroelje
  • 59,820
  • 9
  • 127
  • 177
  • how to return a 404 response using htacces. I tried like this but it wont work `RewriteBase / RewriteCond %{REQUEST_URI} !\.(xml|txt|js|css|png|jpg|jpeg|gif|php)$ RewriteRule ^([^/]+)/([^/]+)/([^/]+)/([^/]+)/?$ /error/403.html [L] ` –  Jun 20 '13 at 13:59
  • @TallboY - you shouldn't need to. If the URL is invalid, Apache should issue a 404 automatically. – Eric Petroelje Jun 20 '13 at 14:05
  • @ Eric Petroelje yes it does for most of the pages, but for pages that exist and are accessed with extra url parameters it doesn't return a 404 response, insted it returns page with no css and javascript applied or with some other errors. –  Jun 20 '13 at 14:08
  • @ Eric Petroelje I want Google resource to properly index my site instead of wasting resource and time –  Jun 20 '13 at 14:11
  • @TallboY - if that's the case, then you would need to check the parameters on the page itself - if they are invalid, then you would issue a 404 from your PHP code. Like this: http://stackoverflow.com/questions/437256/sending-a-404-error-in-php – Eric Petroelje Jun 20 '13 at 14:17
  • Yes that may help, I'll try it. –  Jun 20 '13 at 14:19