1

I am looking for a way to extract GitHub repositories containing files with a certain code string. I can do manually using the GitHub search bar. For instance, if I'm looking for the usages of the library pymc3 I could look for it in the search bar and then click on Code

enter image description here

How does one do this programmatically?

I tried going over the GitHub Search API documentation. The Search Code functionality allows looking into code but that seems to only search based on an user, organization, or repository. The Search Repositories functionality only looks into the description, title and README.

Update 1:

While browsing this post, I believe I found the answer to identify some repositories that contain a code string.

If I write the following code -

url = "https://api.github.com/search/code?q=pymc3 +in:file"

headers = {
  'Authorization': 'Token xxxxxxxxxxxxxxxxx'
}

response = requests.request("GET", url, headers=headers)

print(response.text)

I get the following result -

"total_count":43642,"incomplete_results":false,"items":[{"name":"pymc3_stoch_vol ...

However, the result gives me a bunch of information such as the git URL, HTML URL and some of the repositories that contain this string. I need to find a way to extract all the repositories that contain this string.

Update 2:

I now understand that GitHub limits results to 100 per page and 1000 results overall.

The only question remains why I didn't find this information on GitHub Search API documentation? Please do let me know if my understanding or the linked answer is wrong.

desert_ranger
  • 1,096
  • 3
  • 13
  • 26
  • What about `/search/code`? See https://docs.github.com/en/rest/search?apiVersion=2022-11-28#search-code – Matt Dec 20 '22 at 00:09
  • I tried that, and I have explained the shortcomings of that :) – desert_ranger Dec 20 '22 at 00:11
  • I assumed you meant something different with *Search Code* since this API is not restricted to search based on user/org/repo as far as I know. Why does `/search/code?q=pymc3` not give you the output you need? – Matt Dec 20 '22 at 00:26
  • When I try `https://api.github.com/search/code?q=pymc3`, I get the following error - `Must include at least one user, organization, or repository` – desert_ranger Dec 20 '22 at 01:02
  • @Matt, I think I found a workaround the above problem. I have included the information inside update 1. – desert_ranger Dec 20 '22 at 19:05
  • 1
    It is written [here](https://docs.github.com/en/rest/search?apiVersion=2022-11-28#about-search) that *«[...] the GitHub REST API provides up to 1,000 results for each search.»* The aspect concerning pagination (only 100 results per page) can be found [here](https://docs.github.com/en/rest/guides/using-pagination-in-the-rest-api?apiVersion=2022-11-28). – Matt Dec 21 '22 at 00:47

1 Answers1

1

This kind of query should be addressed more by GraphQL API, but searching code is still not supported.

Only the new code-search (presented here) might be able to provide that, but:

  • it is still in beta
  • its API is not yet public.

So for now, code search in all GitHub repositories is not supported.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • I see, but then why does the process work manually? As shown in the screenshot above, I was able to search for a string and it returned repositories that contained files using that string. – desert_ranger Dec 20 '22 at 16:43
  • 1
    @desert_ranger Basically because *code* search on *all* repositories is not yet supported (probably because of the cost of such query). Its result, as you have seen, can only be incomplete, as [explained here](https://stackoverflow.com/a/62885607/6309). – VonC Dec 20 '22 at 19:17
  • Thank you for your answer. I should have clarified that I wanted the code search within the constraints given here - https://docs.github.com/en/rest/search?apiVersion=2022-11-28#search-code and not across all repositories. Please have a look at update 1 and 2 of my question. Everything is almost figured out :) – desert_ranger Dec 20 '22 at 22:51
  • 1
    @desert_ranger Then the missing part of [pagination](https://docs.github.com/en/rest/guides/using-pagination-in-the-rest-api?apiVersion=2022-11-28) – VonC Dec 20 '22 at 23:06