Prevent Search Engines From Crawling Specific Webpage

Question

I have a web-page on which users can fill some data and to do so they need to be logged in. So, when I created the sitemap.xml using xml-sitemaps.com it created several locs asking for login first. Something like:

<loc> https://www.example.com/login/?next=fill-form/ </loc>

This page don't have content as well, so I thought it's a good idea to prevent search engines from crawling it.

I was wondering what is the right way of preventing search engines from crawling,

adding the below tag in head section,

<meta name="robots" content="noindex, nofollow">

or disallowing the web-page by adding its URL in robots.txt file?

Also, what's the difference between the two?

score 0 · Answer 1 · answered Apr 15 '18 at 16:52

0

You may try both, although the only difference between the aforementioned methods is that the <META> tag contains "NOFOLLOW", which tells a robot not to follow links given in said page.

You must note that robots may choose not to respect both methods, since both methods are not fully-developed standards.

For more information, you may visit: robotstxt.org, it contains in-depth description of how to use both methods, in addition to a robots.txt checker.

answered Apr 15 '18 at 16:52

Mustafa Al Ameen

99
9

Sir, one more question, How can I auto update the sitemap.xml daily? – Apr 15 '18 at 17:10
There are many ways, depending on what web server you're using. Can you specify that so I can suggest you a few? – Mustafa Al Ameen Apr 15 '18 at 17:49

score 0 · Answer 2 · answered Apr 16 '18 at 17:26

robots.txt disallows crawling.
noindex disallows indexing.
You can’t disallow both.

If you Disallow the URL in your robots.txt, conforming bots won’t visit this URL. If they find the link somehow, search engines might decide to index the URL (without ever visiting it).

If you noindex the URL, conforming search engines won’t index the URL, but bots may still visit it (otherwise they wouldn’t be able to learn that noindex is applied in the first place).

Prevent Search Engines From Crawling Specific Webpage

2 Answers2