Is there a way to stop Google from indexing a site?

- 6,334
- 6
- 41
- 78

- 17,809
- 26
- 66
- 92
-
Google [obeys](http://www.google.com/support/webmasters/) the [robots.txt](http://en.wikipedia.org/wiki/Robots.txt) file. – tvanfosson Dec 23 '08 at 23:30
-
What is the robots.txt file? – Developer Dec 23 '08 at 23:31
-
Added link to the wikipedia article on robots.txt – tvanfosson Dec 23 '08 at 23:32
-
Google can still list you in search results regardless of robots.txt – Mark Mar 19 '15 at 21:37
-
@Mark - the question was how to stop Google from indexing a site. Google will obey the robots.txt file and not index the portions of your site that you disallow. – tvanfosson Mar 19 '15 at 22:00
-
1@tvanfosson : while the most common process goes from Indexing to Listing, a site doesn’t have to be indexed to be listed. If a link points at a page, domain or wherever, that link will be followed. If the robots.txt on that domain prevents the search engine from indexing that page, it’ll still show the URL in the results if it can gather from other variables that it might be worth looking at. – edelans Nov 16 '15 at 10:35
-
1I voted to close this question because it is not a programming question and it is off-topic on Stack Overflow. Non-programming questions about your website should be asked on [webmasters.se]. In this case the question has already been asked and answered there: [Block Google (and other) from indexing a domain](https://webmasters.stackexchange.com/questions/43234/block-google-and-other-from-indexing-a-domain) – Stephen Ostermiller Feb 17 '23 at 19:01
9 Answers
robots.txt
User-agent: *
Disallow: /
this will block all search bots from indexing.
for more info see: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360

- 88,102
- 65
- 184
- 229
-
13Actually, to be precise, this will block all *legitimate* bots from crawling the site. Malicious ones will still attempt to do so, just in case that matters. – Lawrence Dol Dec 24 '08 at 06:18
-
1That is correct however if the "spider" does not check robots.txt then it is likely malicious, which from my experience means that they will also spoof the user-agent which makes it ridiculously hard to stop. – UnkwnTech Jan 05 '09 at 10:01
-
1One can also use the robots meta tag. I wrote how here: http://ligatures.net/content/expertise/how-to-use-robots-meta-tags.html The benefit of this method is that it gives finer control (i.e., per page) when necessary. – Jérôme Verstrynge Aug 24 '14 at 15:00
-
4This answer will result in google still indexing the page. When I tried it and searched google, my site still showed up but with "A description for this result is not available because of this site's robots.txt". Please see Carlos's answer. – Justin J Stark Sep 30 '14 at 21:29
-
@JustinJStark My understanding is that this is only true if the page was previously indexed. If a site uses this from day 1, the pages will never make it to Google's (or other legitimate search providers) index. – Joel Coehoorn Jul 29 '15 at 21:45
-
7Beware ! Actually robots.txt file will prevent search engine from crawling your site, but not from indexing it... Indexing is the process of downloading a site or a page’s content to the server of the search engine, thereby adding it to it’s “index”. @Karol 's [answer](http://stackoverflow.com/a/21690774/1570104) is much more accurate and complete. – edelans Nov 16 '15 at 10:33
Remember that preventing Google from crawling doesn't mean you can keep your content private.
My answer is based on few sources: https://developers.google.com/webmasters/control-crawl-index/docs/getting_started https://sites.google.com/site/webmasterhelpforum/en/faq--crawling--indexing---ranking
robots.txt
file controls crawling, but not indexing! Those two are completely different actions, performed separately. Some pages may be crawled but not indexed, and some may even be indexed but never crawled. The link to non-crawled page may exist on other websites, which will make Google indexer to follow it, and try to index.
Question is about indexing which is gathering data about the page so it may be available through search results. It can be blocked adding meta tag:
<meta name="robots" content="noindex" />
or adding HTTP header to response:
X-Robots-Tag: noindex
If the question is about crawling then of course you could create robots.txt
file and put following lines:
User-agent: *
Disallow: /
Crawling is an action performed to gather information about the structure of one specific website. E.g. you've added the site through Google Webmaster Tools. Crawler will take it on account, and visit your website, searching for robots.txt
. If it doesn't find any, then it will assume that it can crawl anything (it's very important to have sitemap.xml
file as well, to help in this operation, and specify priorities and define change frequencies). If it finds the file, it will follow the rules. After successful crawling it will at some point run indexing for crawled pages, but you can't tell when...
Important: this all means that your page can still be shown in Google search results regardless of robots.txt
.
-
What does Google do if you allow indexing with X-Robots-Tag but have a noindex metatag? (UPDATE: found my answer here: https://stackoverflow.com/q/17930932/1429450 ) – Geremia Mar 16 '18 at 19:36
There are several way to stop crawlers including Google to stop crawling and indexing your website.
At server level through header
Header set X-Robots-Tag "noindex, nofollow"
At root domain level through robots.txt file
User-agent: *
Disallow: /
At page level through robots meta tag
<meta name="robots" content="nofollow" />
However, I must say if your website has outdated and not existing pages/urls then you should wait for sometime Google will automatically deindex those urls in next crawl - read https://support.google.com/webmasters/answer/1663419?hl=en

- 291
- 3
- 5
You can disable this server wide by adding the below setting in globally in apache conf or the same parameters can be used in vhost for disabling it for particular vhost only.
Header set X-Robots-Tag "noindex, nofollow"
Once this is done you can test it by verifying apache headers returned.
curl -I staging.mywebsite.com HTTP/1.1 302 Found Date: Sat, 26 Nov 2016 22:36:33 GMT Server: Apache/2.4.18 (Ubuntu) Location: /pages/ X-Robots-Tag: noindex, nofollow Content-Type: text/html; charset=UTF-8

- 519
- 4
- 22
Also you can add the meta robots in this way:
<head>
<title>...</title>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
</head>
And another extra layer is to modify .htaccess, but you need to check it deeply.

- 35
- 8
use a nofollow meta tag:
<meta name="robots" content="nofollow" />
To specify nofollow at the link level, add the attribute rel with the value nofollow to the link:
<a href="example.html" rel="nofollow" />

- 1,274
- 3
- 18
- 32
Is there a way to stop Google from indexing a site?
To stop Google from crawling simply add the following meta
tag to the head
of every page:
<meta name="googlebot" content="noindex, nofollow">

- 9,316
- 3
- 66
- 70
Bear in mind that microsoft's crawler for Bing, despite their claim to obey robots.txt, does not always do so.
Our server stats indicate that they have a number of IP's that run crawlers that do not obey robots.txt as well as a number of ones that do.

- 1,975
- 1
- 23
- 39
I use a simple aspx page to relays results from google to my browser using a fake 'Pref' cookie that gets 100 results at a time and i didn't want google to see this relay page so i check the IP address and if it starts with 66.249 then i simply do a redirect.
Click my name if you value privacy and would like a copy.
another trick i use is to have some javascript that calls a page to set a flag in session because most (NOT ALL) web-bots don't execute the javascript so you know it's a brower with javascript turned off or is a more than likly a bot.

- 9
- 1