7

I want to grab a certain file from a private git repository daily under linux. I've got no problem with files under 1MB via Get content API with curl command as follows.

curl -H "Content-Type: application/json" -H "Authorization: token $TOKEN" -H 'Accept: application/vnd.github.v3.raw' -O $FILEPATH

As the file gets bigger than 1MB now, I have no idea how to do this now.

Git tells me to use the Git Data API to get a blob(up to 100MB, more than enough for me).

Though I've been trying to find a way to grab the SHA1 of the frequently updating file, I haven't came across any applicable method yet. Any suggestion?

Or maybe method other than using git API?

Thanks in advance.

H. Jiang
  • 121
  • 1
  • 5

2 Answers2

5

If file path in the repository is known, you can receive its SHA using Contents API. For example:

~ λ curl -H "Content-Type: application/json" \
    -H "Authorization: token $TOKEN" \
    -H "Accept: application/vnd.github.v3" \
    https://api.github.com/repos/smt116/dotfiles/contents/README.md

{
  "name": "README.md",
  "path": "README.md",
  "sha": "36bba4cf1f8fd3cbbdf81d4cc2291b54a4e56a63",
  "size": 16,
  "url": "https://api.github.com/repos/smt116/dotfiles/contents/README.md?ref=master",
  "html_url": "https://github.com/smt116/dotfiles/blob/master/README.md",
  "git_url": "https://api.github.com/repos/smt116/dotfiles/git/blobs/36bba4cf1f8fd3cbbdf81d4cc2291b54a4e56a63",
  "download_url": "https://raw.githubusercontent.com/smt116/dotfiles/master/README.md",
  "type": "file",
  "content": "IyMgTXkgZG90ZmlsZXMuCg==\n",
  "encoding": "base64",
  "_links": {
    "self": "https://api.github.com/repos/smt116/dotfiles/contents/README.md?ref=master",
    "git": "https://api.github.com/repos/smt116/dotfiles/git/blobs/36bba4cf1f8fd3cbbdf81d4cc2291b54a4e56a63",
    "html": "https://github.com/smt116/dotfiles/blob/master/README.md"
  }
}

Now you can download the file with Git Data API using git_url link that is included in the JSON response.

However if you want to download all blobs from a given repository, you can use Git Trees to fetch the list first. You need to specify commit SHA but you can use HEAD if the most recent commit is okay. For example:

~ λ curl -H "Content-Type: application/json" \
      -H "Authorization: token $TOKEN" \
      -H "Accept: application/vnd.github.v3.raw" \
      https://api.github.com/repos/smt116/dotfiles/git/trees/HEAD

{
  "sha": "0fc96d75ff4182913cec229978bb10ad338012fd",
  "url": "https://api.github.com/repos/smt116/dotfiles/git/trees/0fc96d75ff4182913cec229978bb10ad338012fd",
  "tree": [
    {
      "path": ".agignore",
      "mode": "100644",
      "type": "blob",
      "sha": "e2ca571728887bce8255ab3f66061dde53ffae4f",
      "size": 21,
      "url": "https://api.github.com/repos/smt116/dotfiles/git/blobs/e2ca571728887bce8255ab3f66061dde53ffae4f"
    },
    {
      "path": ".bundle",
      "mode": "040000",
      "type": "tree",
      "sha": "4148d567286de6aa47047672b1f2f73d7bea349b",
      "url": "https://api.github.com/repos/smt116/dotfiles/git/trees/4148d567286de6aa47047672b1f2f73d7bea349b"
    },
    ...

To get details of all files including subdirectories, you have to add recursive=1 query parameter to the URL.

Then you need to parse JSON response, filter those items that have blob type and download files using url attributes.

Maciej Małecki
  • 2,725
  • 19
  • 29
  • What is `~ λ`? Your shell prompt? – das-g Aug 12 '16 at 12:02
  • Yes, it is my shell prompt. – Maciej Małecki Aug 12 '16 at 12:44
  • 1
    When indicating the shell prompt, consider replacing it by a more canonical one like `> `, `$ ` or `$> `, so that it'll be more widely recognized as what it is. – das-g Aug 12 '16 at 13:05
  • Thanks for the reply. Though I don't think Content API will work in my case due to file size, I should be able to work it out with Tree API. I didn't know that you can use HEAD instead of sha for Tree API thanks for the hint. – H. Jiang Aug 15 '16 at 01:31
  • @H.Jiang as I said in the response you can use content Api just to fetch url for blob and use it for download. It will work. – Maciej Małecki Aug 15 '16 at 07:14
  • @H.Jiang is it working for you or is there something that I should explain further? Is it answers your question? – Maciej Małecki Aug 19 '16 at 17:36
  • @smefju I'm so sorry for the terribly late reply. I should definitely check stackoverflow more often. Thanks to your suggestion, I'm using Tree API to identify SHA and get the file with Blob API! As for content API, I still don't see how you could get url for a blob larger than 1MB with the same command you used above. I could only get a response that tells me to use Git Data API. – H. Jiang Oct 19 '16 at 05:20
  • Oh, did you mean that I should use content API to fetch information for the directory? – H. Jiang Oct 19 '16 at 05:36
  • Yes. You can fetch information using Contents API or Git Trees and then download files using Git Data API. – Maciej Małecki Oct 19 '16 at 07:30
1

This should be easier now (May 2022) using just the filepath, since the Get repository Content API finally support raw content up to 100MB instead of 1MB.

Increased file size limit when retrieving file contents via REST API

Previously, the Get repository content REST API endpoint had a file size limit of 1 MB.
That didn’t correspond to the Create or update file contents endpoint which has a file size limit of 100 MB.

Now, both endpoints have a file size limit of 100 MB.

However, requests for file contents larger than 1 MB must include the .raw custom media type in the Accept HTTP header, as shown here:

 Accept: application/vnd.github.v3.raw

Read more about GitHub's REST API endpoints for repository contents.

curl -H "Accept: application/vnd.github.v3+json" \
     https://api.github.com/repos/OWNER/REPO/contents/PATH

Between 1-100 MB: Only the raw or object custom media types are supported.
Both will work as normal, except that when using the object media type, the content field will be an empty string and the encoding field will be "none". > To get the contents of these larger files, use the raw media type.

Greater than 100 MB: This endpoint is not supported.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250