Most answers seem to be outdated. As of 2022, Google specifies the robots.txt
format as follows (source):
File format
The robots.txt file must be a UTF-8 encoded plain text file and the lines must be separated by CR
, CR/LF
, or LF
.
Google ignores invalid lines in robots.txt files, including the Unicode Byte Order Mark (BOM) at the beginning of the robots.txt file, and use only valid lines. For example, if the content downloaded is HTML instead of robots.txt rules, Google will try to parse the content and extract rules, and ignore everything else.
Similarly, if the character encoding of the robots.txt file isn't UTF-8, Google may ignore characters that are not part of the UTF-8 range, potentially rendering robots.txt rules invalid.
Google currently enforces a robots.txt file size limit of 500 kibibytes (KiB). Content which is after the maximum file size is ignored. You can reduce the size of the robots.txt file by consolidating directives that would result in an oversized robots.txt file. For example, place excluded material in a separate directory.
TL;DR to answer the question:
- You can use Notepad to save a
robots.txt
file. Just use UTF-8 encoding.
- It may or may not contain a BOM; It will be ignored anyways.
- The file has to be named
robots.txt
exactly. No capital "R".
- Field names are not case sensitive (source). Therefore, both,
sitemap
and Sitemap
are fine.
Keep in mind that robots.txt
is just a de-facto standard. There is no guarantee any crawler will read this file as Google proposes it to do nor is any crawler forced to respect any defined rules.