Short answer:
You can use a robots.txt to stop indexing of your users GitHub Pages by adding it in your User Page. This robots.txt will be the active robots.txt for all your projects pages as the project pages are reachable as subdirectories (username.github.io/project) in your subdomain (username.github.io).
Longer answer:
You get your own subdomain for GitHub pages (username.github.io). According to this question on MOZ and googles reference each subdomain has/needs its own robots.txt
.
This means that the valid/active robots.txt for project projectname
by user username
lives at username.github.io/robots.txt
. You can put a robots.txt
file there by creating a GitHub Pages page for your user.
This is done by creating a new project/repository named username.github.io
where username
is your username. You can now create a robots.txt file in the master branch of this project/repository and it should be visible at username.github.io/robots.txt
. More information about project, user and organization pages can be found here.
I have tested this with Google, confirming ownership of myusername.github.io
by placing a html file in my project/repository https://github.com/myusername/myusername.github.io/tree/master
, creating a robot.txt file there and then verifying that my robots.txt works by using Googles Search Console webmaster tools (googlebot-fetch). Google does indeed list it as blocked and Google Search Console webmaster tools (robots-testing-tool) confirms it.
To block robots for one projects GitHub Page:
User-agent: *
Disallow: /projectname/
To block robots for all GitHub Pages for your user (User Page and all Project Pages):
User-agent: *
Disallow: /
Other options