How to tell Jekyll to hide one page from search engines?

Question

I have a website consisting of my public profile, made in Jekyll.

It also contains one page, say 'details.html', which contains more personal information about me. I want only those people to see this page whom I give out the link to. In particular, I'd like to hide it from search engines.

How do I best do this? I've heard I can add a robots.txt file or include a meta tag 'nofollow' or 'noindex'.

Which is the usual solution here?
If the way to go is to add a meta tag, how do I add it in only one page given a standard Jekyll setup?

"how to tell Jekyll to hide"...am I the only one who finds that unintentionally amusing? — Krythic, Nov 09 '17 at 23:52

score 6 · Answer 1 · answered Apr 12 '21 at 23:38

Try:

---
layout: 
sitemap: false
---

So, whenever you include sitemap: false line in your front matter, you can exclude that page from your sitemap.

check:

add gem 'jekyll-sitemap' to your site’s Gemfile and run bundle
add the following to your site’s _config.yml:

plugins:
    - jekyll-sitemap

C. Augusto Proiete · Accepted Answer · 2021-08-03T22:53:09.167

The robots.txt is the standard way of telling search engines what to index and what not to (not just for Jekyll, but for websites in general).

Just create a file called robots.txt in the root of your Jekyll site, with the paths that should not be indexed.

e.g.

User-agent: *
Disallow: /2017/02/11/post-that-should-not-be-indexed/
Disallow: /page-that-should-not-be-indexed/
Allow: /

Jekyll will automagically copy the robots.txt to the folder where the site gets generated.

You can also test your robots.txt to make sure it is working the way you expect: https://support.google.com/webmasters/answer/6062598?hl=en

Update 2021-08-02 - Google Specific settings:

You can prevent a page from appearing in Google Search by including a noindex meta tag in the page's HTML code, or by returning a noindex header in the HTTP response

There are two ways to implement noindex: as a meta tag and as an HTTP response header. They have the same effect; choose the method that is more convenient for your site.

`<meta>` tag

To prevent most search engine web crawlers from indexing a page on your site, place the following meta tag into the <head> section of your page:

<meta name="robots" content="noindex">

To prevent only Google web crawlers from indexing a page:

<meta name="googlebot" content="noindex">

HTTP response header

Instead of a meta tag, you can also return an X-Robots-Tag header with a value of either noindex or none in your response. Here's an example of an HTTP response with an X-Robots-Tag instructing crawlers not to index a page:

HTTP/1.1 200 OK
(...)
X-Robots-Tag: noindex
(...)

More details: https://developers.google.com/search/docs/advanced/crawling/block-indexing

But then a normal user could go to www.mysite.com/robots.txt and find these URLs, right? — Alexander Engelhardt, Nov 02 '17 at 19:31
Okay. That's suboptimal, but not fatal. Do you know if the meta tag solution works too? What would be the standard application for that tag then? — Alexander Engelhardt, Nov 02 '17 at 19:44
The meta tag solution would also work with most search engines (including Google - https://support.google.com/webmasters/answer/93710?hl=en), but you need to do one or the other, not both. — C. Augusto Proiete, Nov 02 '17 at 22:07
From [Google's own documentation](https://developers.google.com/search/docs/advanced/robots/intro): "A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google. To keep a web page out of Google, [block indexing with `noindex`](https://developers.google.com/search/docs/advanced/crawling/block-indexing) or password-protect the page." — bmaupin, Jul 29 '21 at 18:06
For clarity it is important to highlight that `Allow: /` does not exist as standard directive. Further reading: http://www.robotstxt.org/orig.html — Goemon Code, May 25 '23 at 03:47

Mr. Hugo · Answer 3 · 2017-11-09T23:00:58.293

0

A robots.txt file is a great solution, but .htaccess might be better for this purpose. Also, make sure you have a private repository!

Note that hosting your code on CloudCannon (paid account) allows you to set up all these things easily from within their interface.

edited Nov 09 '17 at 23:00

answered Nov 03 '17 at 12:13

Mr. Hugo

11,887
3
42
60

How to tell Jekyll to hide one page from search engines?

3 Answers3

Update 2021-08-02 - Google Specific settings:

<meta> tag

HTTP response header

`<meta>` tag