4

I was just wondering if it's possible to count the total number of empty repositories on GitHub.

If not for all users, can it be done for yourself?

Edit

I have tried the size:0 search, but it seems to return a lot of repositories which do contain data. Taking something like size:0..1 didn't help either.

If I try searching for the keyword empty, but it doesn't cover all aspects.

Update

I got a response from Brian Levine (GitHub)

That would be an interesting statistic. We don't have a simple way to do that right now. However, you might be able to use the GitHub API to get close. You could look through public repositories and compare "pushed_at" and "created_at" dates to see if there has been any activity. Additionally, you could find repositories with a "size" of 0. There's more information on how to find this information, and much more, right here:

http://developer.github.com/v3/repos/

Community
  • 1
  • 1
Aniket
  • 9,622
  • 5
  • 40
  • 62
  • What do you mean by "empty"? A repository with no files and no commits? I've never seen such a repository on GitHub! – Robin Green Nov 10 '13 at 12:51
  • 1
    @RobinGreen Yes! Repo with no files. That's very much possible. Many people create repos but never push code. – Aniket Nov 10 '13 at 12:52

3 Answers3

4

You could:

Note that an "empty" repo could still have at least one commit, when created with the default README.md description file.
Actually, as the OP Aniket comments:

I explained the meaning of empty as: 0-1 commits, max 3 files:

.gitignore
README.md
LICENSE 

(Note: README is different from README.md)

Another way is, for each repo, to look at the number of commits.
0 or 1 commit means probably an empty repo.


Update: GitHub confirms there is no current way to determine if a repo is "empty".
The closest way to do that would be:

You could look through public repositories and compare "pushed_at" and "created_at" dates to see if there has been any activity

Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • I have tried this, but 0kb repos also have some data. https://github.com/search?q=size%3A0&ref=simplesearch This means that GitHub doesn't repond to the 0kb query. – Aniket Nov 10 '13 at 12:55
  • @Aniket yes, as I mentioned in my edited answer. I suspect a sort is in order, for you to detect "small" repos which are probably "empty" repos. – VonC Nov 10 '13 at 12:57
  • The `commit:0..1` logic seems to make sense, but GitHub doesn't accept a parameter like that. – Aniket Nov 10 '13 at 13:03
  • @Aniket yes, I agree, and the API only reflects the commits from the last n days, so it isn't even valid for *all* repo. I would still sort repos by size. – VonC Nov 10 '13 at 13:04
  • I think I should get in touch with GitHub about this, if it's even possible. – Aniket Nov 10 '13 at 13:06
  • @Aniket I fully agree. – VonC Nov 10 '13 at 13:07
  • Sent the query. Will update here after I get a response. I explained the meaning of empty as: 0-1 commits, max 3 files - `.gitignore`, `README`, `LICENSE` – Aniket Nov 10 '13 at 13:08
  • @Aniket Great! I have included your definition of "empty" repo in my answer, for more visibility. – VonC Nov 10 '13 at 13:35
  • @Aniket interesting. Thank you for the update (and thanks to GitHub support). I have added the relevant section in the answer. – VonC Nov 19 '13 at 18:00
  • I will try working with the idea they gave me and get back to you if I find that method workable. – Aniket Nov 20 '13 at 05:18
2

To check if a repository is empty, look to see if it has any commits.

https://api.github.com/repos/:owner/:repo/commits?per_page=1

An empty repository will have a non-successful HTTP status and the content...

{
  "message": "Git Repository is empty.",
  "documentation_url": "https://developer.github.com/v3"
}

If it doesn't exist, you'll get a 404 and...

{
  "message": "Not Found",
  "documentation_url": "https://developer.github.com/v3"
}

If it does exist, you'll get an HTTP 200 and one commit.

Schwern
  • 153,029
  • 25
  • 195
  • 336
2

Using the attribute "size" from the API will not help as mentioned by other posters here.

An example is this repository: https://api.github.com/repos/errfree/test

If you note, it displays the size as 48 despite being empty.

Disclaimer: This approach is a hack. It is not efficient nor officially supported by GitHub, but works good enough for me.

Basically, I download the Zip version of the repository. When the repository is empty then it will not return a zip file but provides as result an HTML page saying "This repository is empty.".

After downloading a zip file, I verify if the size is smaller than 30Kb and if this is the case, I look inside the file contents for the string "This repository is empty." to confirm that a given repository is empty.

Here is a practical example of direct zip download that on this case will display an empty page: https://github.com/errfree/test/zipball/master/

My pseudo-code in Java:

        // we might have reached an empty repository
        if(fileZip.length() < 30000){
            // read the contents
            final String content = utils.files.readAsString(fileZip);
            // is this an HTML file with the repository empty message?
            if(content.contains("This repository is empty.")){
                return null;
            }
        }

Hope this helps.

Max Brito
  • 21
  • 3