2

Using the following GitHub API it is possible to get the metadata for the commits in a repository, ordered from the latest to the oldest

https://api.github.com/repos/git/git/commits

Is there a way to obtain similar metadata but in the reverse chronological order of commits, that is, starting with the oldest commits in the repository?

NOTE: I want to obtain such metadata without having to download the full repository.

Thanks

mljrg
  • 4,430
  • 2
  • 36
  • 49
  • That's would be hard. Git doesn't maintain forward links, only backward. So you have to start from the head of a branch and traverse back to the root collecting links between commits. After that you can invert the list of links. – phd Feb 23 '18 at 14:51
  • @phd So that means there is no direct access to the "first commit", and I must start from somewhere (i.e., any commit) and walk back in time through the commit graph. – mljrg Feb 24 '18 at 15:03
  • Yep, exactly. Git doesn't maintain forward links because they can change; in case of such a change the previous commit would have to be updated with the new forward links — but that would mean modified history, force-push, all kind of problems. – phd Feb 24 '18 at 15:06

1 Answers1

0

That's possible using a workaround using GraphQL API. This method is essentially the same as getting the first commit in a repo:

Get the last commit and return the totalCount and the endCursor :

{
  repository(name: "linux", owner: "torvalds") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}

It returns something like that for the cursor and pageInfo object :

"totalCount": 950329,
"pageInfo": {
  "endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 0"
}

I don't have any source about the cursor string format b961f8dc8976c091180839f4483d67b7c2ca2578 0 but I've tested with some other repository with more than 1000 commits and it seems that it's always formatted like:

<static hash> <incremented_number>

In order to iterate from the first commit to the newest, you will need to start from totalCount - 1 - <number_perpage>*<page> starting from page 1:

For example in order to get the first 20 commits from the linux repository :

{
  repository(name: "linux", owner: "torvalds") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 20, after: "fc4f28bb3daf3265d6bc5f73b497306985bb23ab 950308") {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}

Note that this total commit count change over time in this repo, so you need to get the total count value before running the query.

Here is a example iterating the first 300 commits of the Linux repository (starting from the oldest):

import requests

token = "YOUR_ACCESS_TOKEN"

name = "linux"
owner = "torvalds"
branch = "master"

iteration = 3
per_page = 100
commits = []

query = """
query ($name: String!, $owner: String!, $branch: String!){
    repository(name: $name, owner: $owner) {
        ref(qualifiedName: $branch) {
            target {
                ... on Commit {
                    history(first: %s, after: %s) {
                        nodes {
                            message
                            committedDate
                            authoredDate
                            oid
                            author {
                                email
                                name
                            }
                        }
                        totalCount
                        pageInfo {
                            endCursor
                        }
                    }
                }
            }
        }
    }
}
"""

def getHistory(cursor):
    r = requests.post("https://api.github.com/graphql",
        headers = {
            "Authorization": f"Bearer {token}"
        },
        json = {
            "query": query % (per_page, cursor),
            "variables": {
                "name": name,
                "owner": owner,
                "branch": branch
            }
        })
    return r.json()["data"]["repository"]["ref"]["target"]["history"]

#in the first request, cursor is null
history = getHistory("null")
totalCount = history["totalCount"]
if (totalCount > 1):
    cursor = history["pageInfo"]["endCursor"].split(" ")
    for i in range(1, iteration + 1):
        cursor[1] = str(totalCount - 1 - i*per_page)
        history = getHistory(f"\"{' '.join(cursor)}\"")
        commits += history["nodes"][::-1]
else:
    commits = history["nodes"]

print(commits)
Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159