6

I'm trying to build a continuous integration system. Each push to GitHub will trigger a build.

Each build will need to checkout/download the repository for the commit it's processing. I'm trying to find a way to do that that would not take minutes on large repositories (because the build takes a few seconds only…).

Please note that I do not want to store data between builds (that removes the possibility of caching).

The solutions I've explored:

Every other solution I see on GitHub assumes either a recent git version on the server, or that it's fine to clone the repository once but in my case it's not. I'm starting from scratch on every build (because that's a constraint).

So I'm asking in the specific case of GitHub: how can I download (in any way) the code at a specific commit to be able to run continuous integration tools on that commit?

Matthieu Napoli
  • 48,448
  • 45
  • 173
  • 261

1 Answers1

7

You can download an archive of a particular commit from GitHub using a URL of the form:

https://github.com/PROJECT/REPO/archive/COMMITID.zip

For example, if I have a project named "dockerize" and I want to download commit id 169532e I can run:

curl -OL https://github.com/larsks/dockerize/archive/169532e.zip

I've used a short commit ID here, but you can use a long commit ID, or a branch, or a tag, etc.

This will give me a .zip archive with the files from that particular commit. The top-level directory wil be named PROJECT-LONGCOMMITID. For example, the above command would result in an archive in which the top-level directory is dockerize-169532eba46757aca8002e1c9bb257079a739f75/README.md.

This gets you only the files in that particular commit; it does not fetch the .git directory or any repository history.

larsks
  • 277,717
  • 41
  • 399
  • 399
  • 1
    Thanks that's perfect! For reference [here is the API documentation for that](https://developer.github.com/v3/repos/contents/#get-archive-link), and here is the full command I'm using: `curl -sS -L -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/repos/$REPOSITORY_NAME/tarball/$COMMIT_ID | tar --strip-components=1 -C /tmp/code -xz` (it works with private repositories). – Matthieu Napoli Aug 12 '17 at 22:17
  • For public repositories it could be: `curl -sS -L https://api.github.com/repos/$REPOSITORY_NAME/tarball/$COMMI‌​T_ID | tar --strip-components=1 -C /tmp/code -xz` – Matthieu Napoli Aug 12 '17 at 22:17