Is it possible in robots.txt to give one instruction to multiple bots without repeatedly having to mention it?
Example:
User-agent: googlebot yahoobot microsoftbot
Disallow: /boringstuff/
Is it possible in robots.txt to give one instruction to multiple bots without repeatedly having to mention it?
Example:
User-agent: googlebot yahoobot microsoftbot
Disallow: /boringstuff/
Note: since this answer was originally written, Google's description has been substantially rewritten and no longer has any ambiguity on this topic. Moreover, there is finally a formal standard in the form of RFC 9309.
The conclusion below stands: there is a recognised way of grouping user agents, but you might wish to use the simplest format possible in case of unsophisticated crawlers.
Original answer follows.
It's actually pretty hard to give a definitive answer to this, as there isn't a very well-defined standard for robots.txt, and a lot of the documentation out there is vague or contradictory.
The description of the format understood by Google's bots is quite comprehensive, and includes this slightly garbled sentence:
Muiltiple start-of-group lines directly after each other will follow the group-member records following the final start-of-group line.
Which seems to be groping at something shown in the following example:
user-agent: e
user-agent: f
disallow: /g
According to the explanation below it, this constitutes a single "group", disallowing the same URL for two different User Agents.
So the correct syntax for what you want (with regards to any bot working the same way as Google's) would then be:
User-agent: googlebot
User-agent: yahoobot
User-agent: microsoftbot
Disallow: /boringstuff/
However, as Jim Mischel points out, there is no point in a robots.txt file which some bots will interpret correctly, but others may choke on, so it may be best to go with the "lowest common denominator" of repeating the blocks, perhaps by dynamically generating the file with a simple "recipe" and update script.
I think the original robots.txt specification defines it unambiguously: one User-agent
line can only have one value.
A record (aka. a block, a group) consists of lines. Each line has the form
<field>:<optionalspace><value><optionalspace>
User-agent
is a field. It’s value:
The value of this field is the name of the robot the record is describing access policy for.
It’s singular ("name of the robot"), not plural ("the names of the robots").
The robot should be liberal in interpreting this field. A case insensitive substring match of the name without version information is recommended.
If several values would be allowed, how could parsers possibly be liberal? Whichever the delimiting character would be (,
, ,
;
, …), it could be part of the robot name.
The record starts with one or more User-agent lines
Why should you use several User-agent
lines if you could provide several values in one line?
In addition:
Disallow
eitherSo instead of
User-agent: googlebot yahoobot microsoftbot
Disallow: /boringstuff/
you should use
User-agent: googlebot
User-agent: yahoobot
User-agent: microsoftbot
Disallow: /boringstuff/
or (probably safer, as you can’t be sure if all relevant parsers support the not so common way of having several User-agent
lines for a record)
User-agent: googlebot
Disallow: /boringstuff/
User-agent: yahoobot
Disallow: /boringstuff/
User-agent: microsoftbot
Disallow: /boringstuff/
(resp. of course User-agent: *
)
According to the original robots.txt exclusion protocol:
User-agent
The value of this field is the name of the robot the record is describing access policy for.
If more than one User-agent field is present the record describes an identical access policy for more than one robot. At least one field needs to be present per record.
The robot should be liberal in interpreting this field. A case insensitive substring match of the name without version information is recommended.
If the value is '*', the record describes the default access policy for any robot that has not matched any of the other records. It is not allowed to have multiple such records in the "/robots.txt" file.
I have never seen multiple bots listed in a single line. And it's likely that my web crawler would not have correctly handled such a thing. But according to the spec above, it should be legal.
Note also that even if Google were to support multiple user agents in a single directive, or the multiple user agents as described in IMSoP's answer (interesting find, by the way ... I didn't know that one), not all other crawlers will. You need to decide if you want to use the convenient syntax that very possibly only Google and Bing bots will support, or use the more cumbersome and simpler syntax that all polite bots support.
You have to put each bot on a different line.
As mentioned in the accepted answer, the safest approach is to add a new entry for each bot.
This repo has a good robots.txt file for blocking a lot of bad bots: https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/blob/master/robots.txt/robots.txt