The robots.txt
is the standard way of telling search engines what to index and what not to (not just for Jekyll, but for websites in general).
Just create a file called robots.txt
in the root of your Jekyll site, with the paths that should not be indexed.
e.g.
User-agent: *
Disallow: /2017/02/11/post-that-should-not-be-indexed/
Disallow: /page-that-should-not-be-indexed/
Allow: /
Jekyll will automagically copy the robots.txt
to the folder where the site gets generated.
You can also test your robots.txt
to make sure it is working the way you expect: https://support.google.com/webmasters/answer/6062598?hl=en
Update 2021-08-02 - Google Specific settings:
You can prevent a page from appearing in Google Search by including a noindex
meta tag in the page's HTML code, or by returning a noindex
header in the HTTP response
There are two ways to implement noindex
: as a meta tag and as an HTTP response header. They have the same effect; choose the method that is more convenient for your site.
<meta>
tag
To prevent most search engine web crawlers from indexing a page on your site, place the following meta tag into the <head>
section of your page:
<meta name="robots" content="noindex">
To prevent only Google web crawlers from indexing a page:
<meta name="googlebot" content="noindex">
HTTP response header
Instead of a meta tag, you can also return an X-Robots-Tag
header with a value of either noindex
or none
in your response. Here's an example of an HTTP response with an X-Robots-Tag
instructing crawlers not to index a page:
HTTP/1.1 200 OK
(...)
X-Robots-Tag: noindex
(...)
More details: https://developers.google.com/search/docs/advanced/crawling/block-indexing