1

I have links being indexed that shouldn't. I need to remove them from google. What should I enter to robots.txt Link example http://sitename.com/wp-content/uploads/2014/02/The-Complete-Program-2014.pdf

unor
  • 92,415
  • 26
  • 211
  • 360
Dimitry B
  • 11
  • 1
  • 5
  • What have you tried? Wikipedia has [a decent description of robots.txt with examples](https://en.m.wikipedia.org/wiki/Robots.txt). – Adrian Schönig Aug 20 '15 at 22:39

1 Answers1

1

With robots.txt, you can disallow crawling, not indexing.

With this robots.txt

User-agent: *
Disallow: /wp-content/uploads/2014/02/The-Complete-Program-2014.pdf

any URL whose path starts with /wp-content/uploads/2014/02/The-Complete-Program-2014.pdf is not allowed to be crawled.

But if a bot finds this URL in some other way (e.g., linked by someone else), they might still index it (without ever crawling/visiting it). The same goes for search engines that already indexed it: they might keep it (but will no longer visit it).

To disallow indexing, you could use the HTTP header X-Robots-Tag with the noindex parameter. In that case, you should not block crawling of the file in robots.txt, otherwise bots would never be able to see your headers (and so they would never know that you don’t want this file to get indexed).

Community
  • 1
  • 1
unor
  • 92,415
  • 26
  • 211
  • 360