GitHub v3 API: Get full commit list for large comparison

Question

I'm trying to use the GitHub v3 API to get the full list of commits between two SHAs, using the comparison API (/repos/:owner/:repo/compare/:base...:head), but it only returns the first 250 commits and I need to get all of them.

I found the API pagination docs, but the compare API doesn't appear to support either the page or per_page parameters, either with counts or SHAs (EDIT: the last_sha parameter doesn't work either). And unlike the commits API, the compare API doesn't seem to return a Link HTTP header.

Is there any way to either increase the commit count limit on the compare API or to fetch a second page of commits?

I've contacted GitHub support for you. As an author of an API wrapper, I'm curious about this myself. I'll post back with what they answer if they don't answer themselves. — Ian Stapleton Cordasco, Jan 20 '13 at 04:00

score 2 · Answer 1 · answered Oct 09 '14 at 07:34

2

Try using the parameter sha, for example:

https://api.github.com/repos/junit-team/junit/commits?sha=XXX, where the XXX is the SHA of the last returned commit in the current round of the query. Then iterate this process until you reach the ending SHA.

Sample python code:

startSHA = ''
endSHA = ''
while True:
    url = 'https://api.github.com/repos/junit-team/junit/commits?sha=' + startSHA
    r = requests.get(url)
    data = json.loads(r.text)
    for i in range(len(data)):
        commit = data[i]['sha']
        if commit == endSHA:
            #reach the ending SHA, stop here
        startSHA = commit

answered Oct 09 '14 at 07:34

Ida

2,919
3
32
40

Using paging is better. Git history might contain merges where just taking the last sha will only follow one parent of the merge. With paging, this does not happen as commits from all parents are returned. – Nakedible Dec 18 '14 at 09:23
This won't return the answer that the OP is asking for. As a straightforward example, endSHA might not be an ancestor of startSHA. You can get around this somewhat by stopping at the common ancestor of startSHA and endSHA, but I don't think you can get around the larger problem that results are returned in date order, and some of the commits in endSHA..startSHA may have earlier dates than endSHA. – sasmith Oct 17 '18 at 20:47

galuszkak · Answer 2 · 2013-09-07T10:30:32.647

1

It's relatively easy. Here is an example:

import requests
next_url = 'https://api.github.com/repos/pydanny/django-admin2/commits'
while next_url:
    response = requests.get(next_url)
    # DO something with response
    # ...
    # ...
    if 'next' in response.links:
        next_url = response.links['next']['url']
    else:
        next_url = ''

UPDATE:

takie in mind that next urls are different than initial ex: Initial url:

https://api.github.com/repos/pydanny/django-admin2/commits

next url:

https://api.github.com/repositories/10054295/commits?top=develop&last_sha=eb204104bd40d2eaaf983a5a556e38dc9134f74e

So it's totally new url structure.

edited Sep 07 '13 at 10:30

answered Sep 07 '13 at 10:20

galuszkak

513
4
25

2

Thanks to this answer, Django Packages is now able to restore a bunch of commits that got lost after the last GitHub API change. Thanks @galuszkak! – pydanny Sep 07 '13 at 11:57
1

I would note that this is not "relatively easy". The documentation is rather unclear on how this works. While the implementation is "easy", as with many challenges in programming, determining the implementation is hard. – pydanny Sep 07 '13 at 16:58
@pydanny which documentation is unclear? GitHub's API docs, requests docs, both? – Ian Stapleton Cordasco Sep 07 '13 at 19:50
Also you're not answering the question really. They're trying to perform a comparison, you're simply iterating over all of the commits. I do believe there is a difference. – Ian Stapleton Cordasco Sep 07 '13 at 20:02
Unfortunately as @sigmavirus24 indicates, that approach does not work for the comparison API. Thanks though! – etlovett Sep 26 '13 at 01:02
This approach only gets the first 250 commits of the comparison. Does not answer the question. – Nakedible Dec 18 '14 at 09:17

Nakedible · Answer 3 · 2014-12-26T14:05:18.563

I tried solving this again. My notes:

Compare (or pull request commits) list only shows 250 entries. For the pull request one, you can paginate, but you will only get a maximum of 250 commits, no matter what you do.
Commit list API can traverse the entire commit chain with paging all the way to the beginning of the repository.
For a pull request, the "base" commit is not necessarily in the history reachable from the pull request "head" commit. This is the same for comparison, the "base_commit" is not necessarily a part of the history of the current head.
The "merge_base_commit" is, however, a part of the history, so the correct approach is to start from the "head" commit, and iterate commit list queries until you reach the "merge_base_commit". For a pull request, this means that it is mandatory to make a compare between "head" and "base" of the pull separately.
Alternative approach is to use "total_commits" returned by compare, and just iterate backwards until reaching the desired number of commits. This seems to work, however I am not 100% certain that this is correct in all corner cases with merges and such.

So, commit list API, pagination and "merge_base_commit" solves this dilemma.

score 0 · Answer 4 · answered Jan 24 '13 at 11:02

0

Try using the last_sha parameter. The commits API seems to use that for pagination rather than page

answered Jan 24 '13 at 11:02

matt

9
1

Unfortunately, the commit comparison API appears to just ignore the last_sha parameter; the output is identical with and without it. – etlovett Jan 26 '13 at 02:39
Does not work, and `last_sha` is anyway deprecated these days. – Nakedible Dec 18 '14 at 09:16

score 0 · Answer 5 · answered Sep 09 '15 at 08:02

Here's a Sample to get ALL commits for a Pull Request Written using Octokit.NET (https://github.com/octokit/octokit.net)

       var owner = "...";
       var repository = "...";
       var gitHubClient = new GitHubClient(
               new ProductHeaderValue("MyApp"),
               new InMemoryCredentialStore(new Credentials("GitHubToken")));
        var pullRequest = await gitHubClient.PullRequest.Get(owner, repository, pullRequestNumber);
        Console.WriteLine("Summarising Pull Request #{0} - {1}", pullRequest.Number, pullRequest.Title);
        var commits = new List<GitHubCommit>();
        var moreToGet = true;
        var headSha = pullRequest.Head.Sha;
        while (moreToGet)
        {
            var comparison =
                await
                gitHubClient.Repository.Commits.Compare(
                    owner,
                    repository,
                    pullRequest.Base.Sha,
                    headSha);

            // Because we're working backwards from the head towards the base, but the oldest commits are at the start of the list
            commits.InsertRange(0, comparison.Commits);
            moreToGet = comparison.Commits.Count == 250;
            if (moreToGet)
            {
                headSha = commits.First().Sha;
            }
        }

I originally tried making moreToGet set to true if a commit with base sha was found, but's never included in the list of commits (not sure why) so I'm just assuming more to get if the comparison hit's the limit of 250.

score 0 · Answer 6 · answered Jun 15 '17 at 09:26

0

/commits?per_page=* will give you all commits

answered Jun 15 '17 at 09:26

Myron Keurntjes

1
2

score 0 · Answer 7 · answered Dec 14 '17 at 16:03

This is my solution using Octokit.Net

private async Task<IReadOnlyList<GitHubCommit>> GetCommits(string branch, string baseBranch)
{
    // compare branches and get all commits returned
    var result = await this.gitHub.Repository.Commit.Compare(this.repoSettings.Owner, this.repoSettings.Name, baseBranch, branch);
    var commits = result.Commits.ToList();

    // the commits property on the result only has the first 250 commits
    if (result.TotalCommits > 250)
    {
        var baseCommitId = result.MergeBaseCommit.Sha;
        var lastCommitLoadedId = commits.First().Sha;
        var allCommitsLoaded = false;
        var page = 1;

        while (!allCommitsLoaded)
        {
            var missingCommits = await this.gitHub.Repository.Commit.GetAll(this.repoSettings.Owner, this.repoSettings.Name, new CommitRequest
            {
                Sha = lastCommitLoadedId // start from the oldest commit returned by compare
            },
            new ApiOptions
            {
                PageCount = 1,
                PageSize = 100, // arbitrary page size - not sure what the limit is here so set it to a reasonably large number
                StartPage = page
            });

            foreach (var missingCommit in missingCommits)
            {
                if (missingCommit.Sha == lastCommitLoadedId)
                {
                    // this is the oldest commit in the compare result so we already have it
                    continue; 
                }

                if (missingCommit.Sha == baseCommitId)
                {
                    // we don't want to include this commit - its the most recent one on the base branch
                    // we've found all the commits now we can break out of both loops
                    allCommitsLoaded = true;
                    break;
                }

                commits.Add(missingCommit);
            }

            page++;
        }
    }

    return commits;
}

score 0 · Answer 8 · answered Oct 18 '18 at 07:01

I have a solution for this, but it's not savory. It amounts to building the graph yourself. The general strategy is to recursively ask for more comparison objects between BASE and BRANCH until you've found the right number of commits. Without optimization, this is pretty untenable for large comparisons. With optimization, I've found this to require about 1 comparison call per 50 unique commits in the comparison.

import Github
repo = Github(MY_PAT).get_repo(MY_REPO)

def compare(base_commit, branch_commit):
  comparison = repo.compare(base_commit, branch_commit)
  result = set()
  unexplored_commits = set()
  for commit in comparison.commits:
    result.add(commit.sha)
    unexplored_commits.add(commit.sha)
    for parent in commit.parents:
      # It's possible that we'll need to explore a commit's parents directly. E.g., if it's
      # a merge of a large (> 250 commits) recent branch with an older branch.
      unexplored_commits.add(parent.sha)
  while len(commits) < comparison.total_commits:
    commit_to_explore = unexplored_commits.pop()
    commits.update(compare(base_commit, commit_to_explore))
  return commits

If you actually want to implement this, optimizations I've found useful are all around picking which commit to explore. For example:

Pick the commit to explore randomly, rather than with .pop(). This avoids a class of worse case scenarios. I put this first mostly because it's trivial to do.
Track commits whose full list of ancestors you already have, so you know not to explore these commits unnecessarily. This is the "building the graph yourself" part.
If you find ancestors of base_commit in the range, use these as bisection points.

zigomir · Answer 9 · 2021-12-28T16:29:12.620

As of March 22, 2021 GitHub's v3 REST API supports pagination – https://github.blog/changelog/2021-03-22-compare-rest-api-now-supports-pagination/

You simply append ?per_page=100&page=1 to your /compare/ URL.

For example:

page 1 with first 10 commits: https://api.github.com/repos/zigomir/cntdys/compare/324126600e37c2c8026d62218d69ea068fee70e4...87ad9134ee8e02abee351255e2161844cf69e36a?per_page=10&page=1
page 2 with next 10 commits: https://api.github.com/repos/zigomir/cntdys/compare/324126600e37c2c8026d62218d69ea068fee70e4...87ad9134ee8e02abee351255e2161844cf69e36a?per_page=10&page=2

score -1 · Answer 10 · answered Aug 15 '14 at 22:02

From: https://developer.github.com/v3/repos/commits/#working-with-large-comparisons

Working with large comparisons

The response will include a comparison of up to 250 commits. If you are working with a larger commit range, you can use the Commit List API to enumerate all commits in the range.

For comparisons with extremely large diffs, you may receive an error response indicating that the diff took too long to generate. You can typically resolve this error by using a smaller commit range

Does not actually tell you how to do it, because the usage of the commit list API is definitely not clear! — Nakedible, Dec 18 '14 at 09:01

GitHub v3 API: Get full commit list for large comparison

10 Answers10

Linked