0

Given a list of repositories on GitHub with 'repoName' and 'userName', I need to find all the '.java' files, and get the content of the java files. Tried to use RestAPI first, but ran into the rate limit very easily, now I'm switching to GraphQL API, but don't know how to achieve that.

Joyce
  • 1
  • 1

1 Answers1

1

Here is how I would do it:

Algo Identify_java_files
Entry: path the folder
Out: java files in the folder

Identify all files in the folder of the repository
For each file
    if the type of the file is "blob":
        if ".java" is the end of the name of the file
            get its content
    else if the type of the file is "tree":
        Identify_java_files(path of the tree file)

You can easily implement this pseudo code using Python. My pseudo code makes the assumption to use recursion, but it can be done otherwise, it's just for the example. You will need the requests and json libraries.

Here are the queries you might need, and you can use the explorer to test them.

{
  repository(name: "checkout", owner: "actions") {
    defaultBranchRef {
      name
    }
  }
}

This query allows you to check the name of the default branch of the repository. You will need it for the following queries, or you can use a specific branch but you will have to know its name. I don't know (and don't believe) if you can get all the branches names for a repository.

{
  repository(name: "checkout", owner: "actions") {
    object(expression: "main:") {
      ... on Tree {
        entries {
          path
          type
        }
      }
    }
  }
}

This query gets the content of the root folder for a specific repository. The expression: "main:" refers to the branch of the repository along with the path. Here the branch is main and the path is empty (it comes after the ":"), meaning we are looking at the root folder. Some repositories are using "master" as default branch, so be sure of which branch to use.

To check the content of a file, you can use this accepted answer. I updated the example in order for you to be able to try it.

{
  repository(name: "checkout", owner: "actions") {
    object(expression: "main:.github/workflows/test.yml") {
      ... on Blob {
        text
      }
    }
  }
}

You send your requests to the API using requests or alike, and store the responses as JSON or alike for treatment.

As a side note, I do not think it is possible to achieve this without issuing multiple queries. I recently had to do something similar, and this is my first SO answer, so let me know if anything is unclear.

Edit: You can use this answer to list all files in a repository.

Herostere
  • 36
  • 3