I can think of two solutions that work at the present time:
- rename your repo to start with
tags
. So for example, instead of my-repo
, rename it to tags-my-repo
. OR:
- Create a new branch, but don't make that default. Then, on the default branch, delete all files. This has the side effect of a) making the default branch useless beyond hiding from crawler while remaining public, and b) forcing you to use the new branch as master. You can still rename the now-useless default branch and the de-facto new branch whatever you want.
Why I think the older solutions in this thread no longer work: https://github.com/robots.txt has changed since then. At the time of the original question in 2013, robots.txt looked liked this:
User-agent: Googlebot
Allow: /*/*/tree/master
Allow: /*/*/blob/master
Disallow: /ekansa/Open-Context-Data
Disallow: /ekansa/opencontext-*
Disallow: /*/*/pulse
Disallow: /*/*/tree/*
...
whereas now there are no Allow
s but only Disallow
s:
User-agent: *
Disallow: /*/pulse
Disallow: /*/tree/
Disallow: /gist/
Disallow: /*/forks
...
Disallow: /*/branches
Disallow: /*/tags
...
If you simply create a new branch, make that default, and delete the old one, the URL https://github.com/user-name/repo-name
will simply show your new default branch and remain crawl-able under the current robots.txt
.
How my solutions above work: (they are based on how Google currently interprets robots.txt)
Solution 1 would make your repo's URL match Disallow: /*/tags
, thereby excluding it from crawling. So as a matter of fact you can prefix your repo name with any single word from disallow
paths of the form /*/word
without ending slash (so tree
doesn't work since Disallow: /*/tree/
ends with a slash).
Solution 2 simply ensures that the default branch, which is the only branch crawled, doesn't contain stuff that you don't want crawled. In other words, it "moves" all relevant stuff to a branch, so they're in https://github.com/user-name/repo-name/tree/branch-name
, which won't be crawled due to Disallow: /*/tree/
.
Disclaimers
- Obviously, my solutions depend heavily on what
robots.txt
looks like at any given point in time.
- This doesn't guarantee it won't show up in search results.
- This should be obvious: Since your repo is public, people who already know your user name can always navigate to your stuff. This fact has no bearing on the problem at hand, but I thought I should put this out there.