57

Is it possible to get all the file names from repository using the GitHub API?

I'm currently trying to tinker this using PyGithub, but I'm totally ok with manually doing the request as long as it works.

My algorithm so far is:

  1. Get the user repo names
  2. Get the user repo that matches a certain description
  3. ??? get repo file names?
ChrisGPT was on strike
  • 127,765
  • 105
  • 273
  • 257
Anton Antonov
  • 1,217
  • 2
  • 14
  • 21

5 Answers5

52

This will have to be relative to a particular commit, as some files may be present in some commits and absent in others, so before you can look at files you'll need to use something like List commits on a repository:

GET /repos/:owner/:repo/commits

If you're just interested in the latest commit on a branch you can set the sha parameter to the branch name:

sha string SHA or branch to start listing commits from.

Once you have a commit hash, you can inspect that commit

GET /repos/:owner/:repo/git/commits/:sha

which should return something like this (truncated from GitHub's documentation):

{
  "sha": "...",
  "...",
  "tree": {
    "url": "https://api.github.com/repos/octocat/Hello-World/git/trees/691272480426f78a0138979dd3ce63b77f706feb",
    "sha": "691272480426f78a0138979dd3ce63b77f706feb"
  },
  "...": "..."
}

Look at the hash of its tree, which is essentially its directory contents. In this case, 691272480426f78a0138979dd3ce63b77f706feb. Now we can finally request the contents of that tree:

GET /repos/:owner/:repo/git/trees/:sha

The output from GitHub's example is

{
  "sha": "9fb037999f264ba9a7fc6274d15fa3ae2ab98312",
  "url": "https://api.github.com/repos/octocat/Hello-World/trees/9fb037999f264ba9a7fc6274d15fa3ae2ab98312",
  "tree": [
    {
      "path": "file.rb",
      "mode": "100644",
      "type": "blob",
      "size": 30,
      "sha": "44b4fc6d56897b048c772eb4087f854f46256132",
      "url": "https://api.github.com/repos/octocat/Hello-World/git/blobs/44b4fc6d56897b048c772eb4087f854f46256132"
    },
    {
      "path": "subdir",
      "mode": "040000",
      "type": "tree",
      "sha": "f484d249c660418515fb01c2b9662073663c242e",
      "url": "https://api.github.com/repos/octocat/Hello-World/git/blobs/f484d249c660418515fb01c2b9662073663c242e"
    },
    {
      "path": "exec_file",
      "mode": "100755",
      "type": "blob",
      "size": 75,
      "sha": "45b983be36b73c0788dc9cbcb76cbb80fc7bb057",
      "url": "https://api.github.com/repos/octocat/Hello-World/git/blobs/45b983be36b73c0788dc9cbcb76cbb80fc7bb057"
    }
  ]
}

As you can see, we have some blobs, which correspond to files, and some additional trees, which correspond to subdirectories. You may want to do this recursively.

ChrisGPT was on strike
  • 127,765
  • 105
  • 273
  • 257
  • 16
    The [Repository Contents API](https://developer.github.com/v3/repos/contents/#get-contents) may also be worth a look. It provides a familiar directory-style navigation structure, exposing a tree of directories and files. – jasonrudolph Jul 31 '14 at 19:30
  • 4
    `:sha` can be label, as `master` – Peter Krauss Aug 31 '15 at 20:17
  • Seems to have trouble listing the directory with over 100,000 files; try to download the whole repo and process locally would be an option: [link](https://stackoverflow.com/questions/51529060/python-requests-get-all-files-from-a-git-folder) – Xianxing Sep 12 '22 at 15:45
50

You can use Github git trees

https://api.github.com/repos/[USER]/[REPO]/git/trees/[BRANCH]?recursive=1

Repo

https://github.com/deeja/bing-maps-loader

Api Call

https://api.github.com/repos/deeja/bing-maps-loader/git/trees/master?recursive=1

which returns

{
sha: "55382e87889ccb4c173bc99a42cc738358fc253a",
url: "https://api.github.com/repos/deeja/bing-maps-loader/git/trees/55382e87889ccb4c173bc99a42cc738358fc253a",
tree: [
{
path: "README.md",
mode: "100644",
type: "blob",
sha: "41ceefc1262bb80a25529342ee3ec2ec7add7063",
size: 3196,
url: "https://api.github.com/repos/deeja/bing-maps-loader/git/blobs/41ceefc1262bb80a25529342ee3ec2ec7add7063"
},
{
path: "index.js",
mode: "100644",
type: "blob",
sha: "a81c94f70d1ca2a0df02bae36eb2aa920c7fb20e",
size: 1581,
url: "https://api.github.com/repos/deeja/bing-maps-loader/git/blobs/a81c94f70d1ca2a0df02bae36eb2aa920c7fb20e"
},
{
path: "package.json",
mode: "100644",
type: "blob",
sha: "45f24dcb7a457b14fede4cb907e957600882b340",
size: 595,
url: "https://api.github.com/repos/deeja/bing-maps-loader/git/blobs/45f24dcb7a457b14fede4cb907e957600882b340"
}
],
truncated: false
}
Dan
  • 12,808
  • 7
  • 45
  • 54
22

Much eaiser now with the graphql api, you can get it all in a single query

first you get your repo:

query {
  repository(name: "MyRepo" owner: "mylogin"){

  }
}

then you get its defaultBranchRef to make life easy

    defaultBranchRef{

    }

Now all a branch ref really is, is just a pointer to a commit, and since graphql is strongly typed (and refs can be different things) we need to let it know it is a commit,

   target{
      ...on Commit {

      }
   }

so target is what our ref is pointing to, and we say "if its a commit, do this"

and what should it do? it should get the most recent commit (since that will have the latest files in the repo)

so to do that we query history

        history(first: 1 until: "2019-10-08T00:00:00"){
            nodes{

            }
        }

now inside of nodes we are inside of our commit and now we can see the files, the files in a commits pointer are really just a pointer to a tree, and a tree just has entries, which can be objects of either type Tree, or type blob

entries that represent files are known as blobs, but since we dont do anything with them but list their names, you dont even need to know that

but its important to know that trees are also entries, so if you find a tree you need to dig in deeper, but you can only go a pre defined amount of levels deep.

       tree{
           entries {
             name
             object {
               ...on Tree{
                 entries{
                   name
                   object {
                      ...on Tree{
                        entries{
                          name
                        }
                      }
                   }
                 }
               }
             }
           } 
       }

now to put it all together:

query{
  repository(owner: "MyLogin", name: "MyRepo") {
    defaultBranchRef {
      target {
        ... on Commit {
          history(first: 1 until: "2019-10-08T00:00:00") {
            nodes {
              tree {
                entries {
                  name
                  object {
                    ... on Tree {
                      entries {
                        name
                        object{
                          ...on Tree{
                            entries{
                              name
                              object{
                                ...on Tree{
                                  entries{
                                    name
                                  }                                  
                                }
                              }
                            }   
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
Kyle Roux
  • 736
  • 5
  • 11
10

As Dan mentioned: github trees

See working example below

import requests

user = "grumbach"
repo = "ft_ping"

url = "https://api.github.com/repos/{}/{}/git/trees/master?recursive=1".format(user, repo)
r = requests.get(url)
res = r.json()

for file in res["tree"]:
    print(file["path"])

For the sake of simplicity I omitted error management, velociraptors are extinct anyway…

Anselme
  • 101
  • 1
  • 2
0

Use gh api for authenticated HTTP request to the GitHub API

in one line
gh api -X GET /repos/octocat/Hello-World/commits | grep -E -o ".{0,0}\[{\"sha\":\".{0,40}" | sed 's/\[{\"sha\":\"//' | xargs -I {} gh api -X GET /repos/octocat/Hello-World/commits/{} | grep -E -o "\"filename\":\".*?\""

Or in two steps
Get commits sha
gh api -X GET /repos/octocat/Hello-World/commits | grep -E -o ".{0,0}\[{\"sha\":\".{0,40}" | sed 's/\[{\"sha\":\"//' >> ~/commits
List file names
xargs < ~/commits -I {} gh api -X GET /repos/octocat/Hello-World/commits/{} | grep -E -o "\"filename\":\".*?\""
Ax_
  • 803
  • 8
  • 11