0

I have a joomla website with over 1000 pages that contain urls like this:

www.mysite.com/example.html?start=10  
www.mysite.com/example.html?start=20  
www.mysite.com/example.html?limitstart=0  

All this URL are indexed by google, in google webmaster tool i have a huge list of duplicate meta description caused by theses paginations.

I know this is not difficult to block them using robots.txt thats why i need some help.

Techie
  • 44,706
  • 42
  • 157
  • 243
BerrKamal
  • 89
  • 1
  • 10

2 Answers2

1

You can create a robot.txt and use the Disallow property.

For example, since you mentioned these 3 urls:

www.mysite.com/example.html?start=10  
www.mysite.com/example.html?start=20  
www.mysite.com/example.html?limitstart=0

you should use this:

Disallow: /?start=
Disallow: /?limitstart=

You have to use Disallow: followed by / and a pattern included in what you want to disallow. It can target specific files or folders.

You can also specify what bots you want to hide the files or folders to, by using the User-agent property:

User-agent: *
Disallow: /?start=
Disallow: /?limitstart=

the code above will work for any bot or crawling engine.

User-agent: googlebot
Disallow: /?start=
Disallow: /?limitstart=

this code will work only for Google for example.

For a reference you can read the material you find on www.robotstxt.org or also wikipedia has a page made good enough. http://en.wikipedia.org/wiki/Robots.txt

Another detailed reference can be found here: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

Fabio
  • 2,074
  • 2
  • 24
  • 38
  • You are welcome. If you are satisfied with my answer you may upvote my answer and set my answer as "accepted answer". @NiceOne – Fabio Dec 27 '12 at 21:28
  • Your robots.txt is wrong. The rule `Disallow: /start` *never* blocks a URL like `example.com/example.html?start=10`. It would only blocks URLs like `example.com/start` or `example.com/startfoobar` etc. – unor Dec 28 '12 at 15:50
  • Ty for your advice unor, if it's wrong what should i put instead of Disallow: /start ? – BerrKamal Dec 28 '12 at 20:16
  • I edited the answer. It should be okay now. Check the last link i added. I based the answer on that. @NiceOne – Fabio Dec 28 '12 at 20:37
  • there is another question on stackoverflow: http://stackoverflow.com/questions/1495363/how-to-disallow-all-dynamic-urls-robots-txt?rq=1 – Fabio Dec 28 '12 at 20:48
  • Ty again Fabio i have rectified my robots.txt – BerrKamal Dec 28 '12 at 21:09
0

Correct Answer will be :

User-agent: *

Disallow: /*?start=

Disallow: /*?limitstart=