11

What is the most efficient way to determine when the initial commit in a GitHub repository was made? Repositories have a created_at property, but for repositories that contain imported history the oldest commit may be significantly older.

When using the command line something like this would work:

git rev-list --max-parents=0 HEAD

However I don't see an equivalent in the GitHub API.

Mihai Parparita
  • 4,236
  • 1
  • 23
  • 30
  • I don't think there's a way to get that in a constant number of API requests -- currently, there's no equivalent to the Git command you listed. So, you'd need to go through the list of commits to find the last page (e.g. using binary search), and then get the last commit on that page. Also (and I'm pretty sure you're aware of this), notice that the oldest commit (by timestamp) doesn't need to be the last one (no parents) -- rewriting history and setting timestamps manually would allow the oldest commit to be in other places in the commit tree. – Ivan Zuzak Aug 04 '14 at 20:43
  • I found this question and was interested in the created_at of the repository which can be extracted with a single line: curl -s https://api.github.com/users/WDScholia | jq .created_at "2020-05-18T17:45:47Z" – Wolfgang Fahl Dec 18 '22 at 13:47

6 Answers6

8

Using the GraphQL API, there is a workaround for getting the oldest commit (initial commit) in a specific branch.

First get the last commit and return the totalCount and the endCursor :

{
  repository(name: "linux", owner: "torvalds") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1) {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}

It returns something like that for the cursor and pageInfo object :

"totalCount": 931886,
"pageInfo": {
  "endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 0"
}

I don't have any source about the cursor string format b961f8dc8976c091180839f4483d67b7c2ca2578 0 but I've tested with some other repository with more than 1000 commits and it seems that it's always formatted like:

<static hash> <incremented_number>

So you would just subtract 2 from totalCount (if totalCount is > 1) and get that oldest commit (or initial commit if you prefer):

{
  repository(name: "linux", owner: "torvalds") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 1, after: "b961f8dc8976c091180839f4483d67b7c2ca2578 931884") {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}

which gives the following output (initial commit by Linus Torvalds) :

{
  "data": {
    "repository": {
      "ref": {
        "target": {
          "history": {
            "nodes": [
              {
                "message": "Linux-2.6.12-rc2\n\nInitial git repository build. I'm not bothering with the full history,\neven though we have it. We can create a separate \"historical\" git\narchive of that later if we want to, and in the meantime it's about\n3.2GB when imported into git - space that would just make the early\ngit days unnecessarily complicated, when we don't have a lot of good\ninfrastructure for it.\n\nLet it rip!",
                "committedDate": "2005-04-16T22:20:36Z",
                "authoredDate": "2005-04-16T22:20:36Z",
                "oid": "1da177e4c3f41524e886b7f1b8a0c1fc7321cac2",
                "author": {
                  "email": "torvalds@ppc970.osdl.org",
                  "name": "Linus Torvalds"
                }
              }
            ],
            "totalCount": 931886,
            "pageInfo": {
              "endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 931885"
            }
          }
        }
      }
    }
  }
}

A simple implementation in to get the first commit using this method :

import requests

token = "YOUR_TOKEN"

name = "linux"
owner = "torvalds"
branch = "master"

query = """
query ($name: String!, $owner: String!, $branch: String!){
  repository(name: $name, owner: $owner) {
    ref(qualifiedName: $branch) {
      target {
        ... on Commit {
          history(first: 1, after: %s) {
            nodes {
              message
              committedDate
              authoredDate
              oid
              author {
                email
                name
              }
            }
            totalCount
            pageInfo {
              endCursor
            }
          }
        }
      }
    }
  }
}
"""

def getHistory(cursor):
    r = requests.post("https://api.github.com/graphql",
        headers = {
            "Authorization": f"Bearer {token}"
        },
        json = {
            "query": query % cursor,
            "variables": {
                "name": name,
                "owner": owner,
                "branch": branch
            }
        })
    return r.json()["data"]["repository"]["ref"]["target"]["history"]

#in the first request, cursor is null
history = getHistory("null")
totalCount = history["totalCount"]
if (totalCount > 1):
    cursor = history["pageInfo"]["endCursor"].split(" ")
    cursor[1] = str(totalCount - 2)
    history = getHistory(f"\"{' '.join(cursor)}\"")
    print(history["nodes"][0])
else:
    print("got oldest commit (initial commit)")
    print(history["nodes"][0])

You can find an example in on this post

Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
  • It can also be used to fetch the first commit of a given subdirectory or file through `path` parameter (present in the `history(...)` resolver) https://docs.github.com/en/rest/commits/commits#list-commits--parameters. – Alex Rintt Nov 25 '22 at 13:53
3

This can be done in as few as two requests, if data is already cached (on GitHub's side) and depending on your precision requirements.

First check to see if there are in fact commits before the creation time by doing a GET for /repos/:owner/:repo/commits with the until parameter set to the creation time (as suggested by VonC's answer) and limiting the number returned to 1 commit (via the per_page parameter).

If there are commits before the creation time, then the contributors statistics endpoint (/repos/:owner/:repo/stats/contributors) can be invoked. The response has a weeks list per contributor, and the oldest w value there is the same week as the oldest commit.

If you need a precise timestamp, you can then use the commits listing endpoint again with until and since set to the 7 days after the oldest week value.

Note that the statistics endpoint may return a 202 indicating that statistics are not available, in which case a retry in a few seconds is required.

Mihai Parparita
  • 4,236
  • 1
  • 23
  • 30
2

One suggestion would be to list commits on a repo (See GitHub api V3 section), using the until parameter, set to the creation of the repo (plus one day, for instance).

GET /repos/:owner/:repo/commits

That way, you would list all commits created at the time of the repo being created, or before: that would limit the list, excluding all the commits created after the repo creation.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • I think the idea is that they don't necessarily know which commit could be the oldest and as Ivan pointed out, the oldest commit may indeed have parents. – Ian Stapleton Cordasco Aug 05 '14 at 13:08
  • Thanks for the suggestion to first do a commit list using the creation date as the upper bound. I ended up using that as an initial filter and then using the statistics API to not have to paginate through all commits (see my answer, it's implemented at https://github.com/mihaip/githop/commit/6cc162c91b25bd26da379da2c1656fff6c199a1a). – Mihai Parparita Aug 27 '14 at 06:57
  • @MihaiParparita nice implementation, more complete than my answer. +1 – VonC Aug 27 '14 at 06:58
  • 1
    @MihaiParparita I assumed you renamed your repo from `githop` to `retrogit`? https://github.com/mihaip/retrogit/commit/6cc162c91b25bd26da379da2c1656fff6c199a1a – testworks Sep 21 '21 at 02:29
  • @testworks: Correct, there was a rename later. – Mihai Parparita Sep 21 '21 at 21:55
0

Posting my solution, since all others didn't work for me.

The following script retrieves the list of commits for a given REPO ("owner/repo"), traverses to the last page if necessary, and outputs the JSON object of the last (oldest) commit.

    REPO="owner/repo"
    URL="https://api.github.com/repos/$REPO/commits"
    H=" -H \"Accept: application/vnd.github+json\" \
      -H \"X-GitHub-Api-Version: 2022-11-28\""
    
    response=$(curl -s -L --include $H $URL | awk 'NR > 1')
    
    # Split the output into header and json
    header=$(echo "$response" | awk 'BEGIN{RS="\r\n";ORS="\r\n"} /^[a-zA-Z0-9-]+:/')
    commits=$(echo "$response" | awk '!/^[a-zA-Z0-9-]+:/')
    
    # If paginated, get last page
    if [[ $header == *"link"* ]]; then
      # Extract the last page value
      link_line=$(echo "$header" | grep -i "^link:")
      last_page=$(echo "$link_line" | sed -n 's/.*page=\([0-9]\+\)[^0-9].*rel="last".*/\1/p')
    
      # Get last-page commits
      commits=$(curl -s -L $H $URL?page=$last_page)
    fi
    
    # Print first commit
    echo $commits | jq '.[-1].commit'
FedFranz
  • 529
  • 1
  • 5
  • 15
-2

Trial and error on the page number,

https://github.com/fatfreecrm/fat_free_crm/commits/master?page=126

The git history, maybe using gitk for instance, could help your trial and error be more efficient.

-2

This isn't via API, but on GitHub.com: if you have the latest commit SHA and the commit count, you can build the URL to find it:

https://github.com/USER/REPO/commits?after=LAST_COMMIT_SHA+COMMIT_COUNT_MINUS_2

# Example. Commit count in this case was 1573
https://github.com/sindresorhus/refined-github/commits/master
  ?after=a76ed868a84cd0078d8423999faaba7380b0df1b+1571
fregante
  • 29,050
  • 14
  • 119
  • 159