32

I am trying to count commits for many large github repos using the API, so I would like to avoid getting the entire list of commits (this way as an example: api.github.com/repos/jasonrudolph/keyboard/commits ) and counting them.

If I had the hash of the first (initial) commit , I could use this technique to compare the first commit to the latest and it happily reports the total_commits in between (so I'd need to add one) that way. Unfortunately, I cannot see how to elegantly get the first commit using the API.

The base repo URL does give me the created_at (this url is an example: api.github.com/repos/jasonrudolph/keyboard ), so I could get a reduced commit set by limiting the commits to be until the create date (this url is an example: api.github.com/repos/jasonrudolph/keyboard/commits?until=2013-03-30T16:01:43Z) and using the earliest one (always listed last?) or maybe the one with an empty parent (not sure about if forked projects have initial parent commits).

Any better way to get the first commit hash for a repo?

Better yet, this whole thing seems convoluted for a simple statistic, and I wonder if I'm missing something. Any better ideas for using the API to get the repo commit count?

Edit: This somewhat similar question is trying to filter by certain files (" and within them to specific files."), so has a different answer.

Community
  • 1
  • 1
SteveCoffman
  • 983
  • 1
  • 8
  • 22

9 Answers9

15

You can consider using GraphQL API v4 to perform commit count for multiple repositories at the same times using aliases. The following will fetch commit count for all branches of 3 distinct repositories (up to 100 branches per repo) :

{
  gson: repository(owner: "google", name: "gson") {
    ...RepoFragment
  }
  martian: repository(owner: "google", name: "martian") {
    ...RepoFragment
  }
  keyboard: repository(owner: "jasonrudolph", name: "keyboard") {
    ...RepoFragment
  }
}

fragment RepoFragment on Repository {
  name
  refs(first: 100, refPrefix: "refs/heads/") {
    edges {
      node {
        name
        target {
          ... on Commit {
            id
            history(first: 0) {
              totalCount
            }
          }
        }
      }
    }
  }
}

Try it in the explorer

RepoFragment is a fragment which helps to avoid the duplicate query fields for each of those repo

If you only need commit count on the default branch, it's more straightforward :

{
  gson: repository(owner: "google", name: "gson") {
    ...RepoFragment
  }
  martian: repository(owner: "google", name: "martian") {
    ...RepoFragment
  }
  keyboard: repository(owner: "jasonrudolph", name: "keyboard") {
    ...RepoFragment
  }
}

fragment RepoFragment on Repository {
  name
  defaultBranchRef {
    name
    target {
      ... on Commit {
        id
        history(first: 0) {
          totalCount
        }
      }
    }
  }
}

Try it in the explorer

Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
  • 1
    Bertrand, does the graph api allows querying without a token? Seems this is a bit of a problem for public repos, since the earlier api works without a token. – Mahesh Feb 04 '19 at 07:13
  • 2
    @Mahesh yes this is a big caveat of the graphQL api if you just want to request public content or use the api from a web client. Using the graphql api is only possible in an environment where you can safely store the access token, otherwise just stick to rest api v3 – Bertrand Martel Feb 04 '19 at 08:28
  • `defaultBranchRef` is the way to go for # of commits on main branch. thanks! – Anton Nov 30 '22 at 04:23
12

If you're looking for the total number of commits in the default branch, you might consider a different approach.

Use the Repo Contributors API to fetch a list of all contributors:

https://developer.github.com/v3/repos/#list-contributors

Each item in the list will contain a contributions field which tells you how many commits the user authored in the default branch. Sum those fields across all contributors and you should get the total number of commits in the default branch.

The list of contributors if often much shorter than the list of commits, so it should take fewer requests to compute the total number of commits in the default branch.

Ivan Zuzak
  • 18,068
  • 3
  • 69
  • 61
  • 1
    Thanks. When I used [a link like this](https://api.github.com/repos/jquery/jquery/contributors?anon=true) it appeared to be limited to 30 items. I found that requests that return multiple items will be paginated to 30 items by default. You can specify further pages with the `?page` parameter. So if you get 30, you need to check if there are more pages, and add them to the initial results. – SteveCoffman Jan 15 '15 at 18:31
  • @SteveCoffman Yep, that's the expected behavior: https://developer.github.com/v3/#pagination – Ivan Zuzak Jan 15 '15 at 18:54
  • It looks like either of the two approaches (yours and mine) are viable, and neither is elegant. I'm going to accept yours as the answer unless someone else comes up with something we've both overlooked. Thanks. – SteveCoffman Jan 16 '15 at 19:56
  • 2
    Why wouldn't GitHub just include the commit count in the API response? Disappointing that one has to needlessly traverse the list of contributors. – Dan Dascalescu Mar 21 '15 at 04:34
  • 3
    Note that this approach returns the wrong number if any of the users have been removed from your repo/organization/whatever, such as in an employee leaving the company. – snowe Apr 11 '17 at 21:11
  • This answer talks about getting # of commits for default branch. See the bottom of @Bertrand Martel's answer for exactly this (using GraphQL API)! – Anton Nov 30 '22 at 04:26
12

Make a request on https://api.github.com/repos/{username}/{repo}/commits?sha={branch}&per_page=1&page=1

Now just take the Link parameter of the response header and grab out the page count situated just before rel="last"

This page count is equal to the total number of commits in that branch!

The trick was to use &per_page=1&page=1. It distributed 1 commit in 1 page. So, the total number of commits will be equal to the total number of pages.

Shashi
  • 565
  • 5
  • 8
6

Simple solution: Look at the page number. Github paginates for you. so you can easily calculate the number of commits by just getting the last page number from the Link header, subtracting one (you'll need to add up the last page manually), multiplying by the page size, grabbing the last page of results and getting the size of that array and adding the two numbers together. It's a max of two API calls!

Here is my implementation of grabbing the total number of commits for an entire organization using the octokit gem in ruby:

@github = Octokit::Client.new access_token: key, auto_traversal: true, per_page: 100

Octokit.auto_paginate = true
repos = @github.org_repos('my_company', per_page: 100)

# * take the pagination number
# * get the last page
# * see how many items are on it
# * multiply the number of pages - 1 by the page size
# * and add the two together. Boom. Commit count in 2 api calls
def calc_total_commits(repos)
    total_sum_commits = 0

    repos.each do |e| 
        repo = Octokit::Repository.from_url(e.url)
        number_of_commits_in_first_page = @github.commits(repo).size
        repo_sum = 0
        if number_of_commits_in_first_page >= 100
            links = @github.last_response.rels

            unless links.empty?
                last_page_url = links[:last].href

                /.*page=(?<page_num>\d+)/ =~ last_page_url
                repo_sum += (page_num.to_i - 1) * 100 # we add the last page manually
                repo_sum += links[:last].get.data.size
            end
        else
            repo_sum += number_of_commits_in_first_page
        end
        puts "Commits for #{e.name} : #{repo_sum}"
        total_sum_commits += repo_sum
    end
    puts "TOTAL COMMITS #{total_sum_commits}"
end

and yes I know the code is dirty, this was just thrown together in a few minutes.

snowe
  • 1,312
  • 1
  • 20
  • 41
  • 2
    Didn't use your code but the idea of looking at the page numbers in the header links saved me many API calls. Thanks – Marcino May 17 '18 at 16:33
6

Using the GraphQL API v4 is probably the way to handle this if you're starting out in a new project, but if you're still using the REST API v3 you can get around the pagination issue by limiting the request to just 1 result per page. By setting that limit, the number of pages returned in the last link will be equal to the total.

For example using python3 and the requests library

def commit_count(project, sha='master', token=None):
    """
    Return the number of commits to a project
    """
    token = token or os.environ.get('GITHUB_API_TOKEN')
    url = f'https://api.github.com/repos/{project}/commits'
    headers = {
        'Accept': 'application/json',
        'Content-Type': 'application/json',
        'Authorization': f'token {token}',
    }
    params = {
        'sha': sha,
        'per_page': 1,
    }
    resp = requests.request('GET', url, params=params, headers=headers)
    if (resp.status_code // 100) != 2:
        raise Exception(f'invalid github response: {resp.content}')
    # check the resp count, just in case there are 0 commits
    commit_count = len(resp.json())
    last_page = resp.links.get('last')
    # if there are no more pages, the count must be 0 or 1
    if last_page:
        # extract the query string from the last page url
        qs = urllib.parse.urlparse(last_page['url']).query
        # extract the page number from the query string
        commit_count = int(dict(urllib.parse.parse_qsl(qs))['page'])
    return commit_count
Asclepius
  • 57,944
  • 17
  • 167
  • 143
buckley
  • 2,060
  • 1
  • 17
  • 12
3

I just made a little script to do this. It may not work with large repositories since it does not handle GitHub's rate limits. Also it requires the Python requests package.

#!/bin/env python3.4
import requests

GITHUB_API_BRANCHES = 'https://%(token)s@api.github.com/repos/%(namespace)s/%(repository)s/branches'
GUTHUB_API_COMMITS = 'https://%(token)s@api.github.com/repos/%(namespace)s/%(repository)s/commits?sha=%(sha)s&page=%(page)i'


def github_commit_counter(namespace, repository, access_token=''):
    commit_store = list()

    branches = requests.get(GITHUB_API_BRANCHES % {
        'token': access_token,
        'namespace': namespace,
        'repository': repository,
    }).json()

    print('Branch'.ljust(47), 'Commits')
    print('-' * 55)

    for branch in branches:
        page = 1
        branch_commits = 0

        while True:
            commits = requests.get(GUTHUB_API_COMMITS % {
                'token': access_token,
                'namespace': namespace,
                'repository': repository,
                'sha': branch['name'],
                'page': page
            }).json()

            page_commits = len(commits)

            for commit in commits:
                commit_store.append(commit['sha'])

            branch_commits += page_commits

            if page_commits == 0:
                break

            page += 1

        print(branch['name'].ljust(45), str(branch_commits).rjust(9))

    commit_store = set(commit_store)
    print('-' * 55)
    print('Total'.ljust(42), str(len(commit_store)).rjust(12))

# for private repositories, get your own token from
# https://github.com/settings/tokens
# github_commit_counter('github', 'gitignore', access_token='fnkr:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
github_commit_counter('github', 'gitignore')
fnkr
  • 9,428
  • 6
  • 54
  • 61
3

Here is a JavaScript example using Fetch based on snowe's approach

Fetch example

/**
 * @param {string} owner Owner of repo
 * @param {string} repo Name of repo
 * @returns {number} Number of total commits the repo contains on main master branch
 */
export const getTotalCommits = (owner, repo) => {
  let url = `https://api.github.com/repos/${owner}/${repo}/commits?per_page=100`;
  let pages = 0;

  return fetch(url, {
    headers: {
      Accept: "application/vnd.github.v3+json",
    },
  })
    .then((data) => data.headers)
    .then(
      (result) =>
        result
          .get("link")
          .split(",")[1]
          .match(/.*page=(?<page_num>\d+)/).groups.page_num
    )
    .then((numberOfPages) => {
      pages = numberOfPages;
      return fetch(url + `&page=${numberOfPages}`, {
        headers: {
          Accept: "application/vnd.github.v3+json",
        },
      }).then((data) => data.json());
    })
    .then((data) => {
      return data.length + (pages - 1) * 100;
    })
    .catch((err) => {
      console.log(`ERROR: calling: ${url}`);
      console.log("See below for more info:");
      console.log(err);
    });
};

Usage

getTotalCommits('facebook', 'react').then(commits => {
    console.log(commits);
});
  • Nice answer. However, this can be done in just 1 request instead of 2. You can check my answer here: https://stackoverflow.com/a/70610670/10266115 – Shashi Jan 06 '22 at 17:01
  • This example can be used to execute in one request as in stackoverflow.com/a/70610670/10266115 suggested by @Shashi by removing the last 2 then blocks and replacing the URL query string with "?per_page=1&page=1". – Damien Golding Feb 02 '23 at 23:52
1

I used python to create a generator which returns a list of contributors, sums up the total commit count, and then checks if it is valid. Returns True if it has less, and False if the same or greater commits. The only thing you have to fill in is the requests session that uses your credentials. Here's what I wrote for you:

from requests import session
def login()
    sess = session()

    # login here and return session with valid creds
    return sess

def generateList(link):
    # you need to login before you do anything
    sess = login()

    # because of the way that requests works, you must start out by creating an object to
    # imitate the response object. This will help you to cleanly while-loop through
    # github's pagination
    class response_immitator:
        links = {'next': {'url':link}}
    response = response_immitator() 
    while 'next' in response.links:
        response = sess.get(response.links['next']['url'])
        for repo in response.json():
            yield repo

def check_commit_count(baseurl, user_name, repo_name, max_commit_count=None):
    # login first
    sess = login()
    if max_commit_count != None:
        totalcommits = 0

        # construct url to paginate
        url = baseurl+"repos/" + user_name + '/' + repo_name + "/stats/contributors"
        for stats in generateList(url):
            totalcommits+=stats['total']

        if totalcommits >= max_commit_count:
            return False
        else:
            return True

def main():
    # what user do you want to check for commits
    user_name = "arcsector"

    # what repo do you want to check for commits
    repo_name = "EyeWitness"

    # github's base api url
    baseurl = "https://api.github.com/"

    # call function
    check_commit_count(baseurl, user_name, repo_name, 30)

if __name__ == "__main__":
    main()
Arcsector
  • 1,153
  • 9
  • 14
1

Works with Github Enterprise:

gh api https://github.myenterprise.com/api/v3/repos/myorg/myrepo/commits --paginate | jq length | datamash sum 1

And if you're a Unix pipeline advocate, you can combine with this with a repository list to get all commits in the organization.

Setup notes

For Mac OS:

brew install gh
brew install datamash
gh auth login
Sridhar Sarnobat
  • 25,183
  • 12
  • 93
  • 106