1

I want to obtain the list of files on a Github repo. I followed this answer but I noticed that sometimes I have a HTTP 403 error. For example, if I run the following code:

library(httr)

for (i in 1:10) {
  req <- GET("https://api.github.com/repos/etiennebacher/tidytuesday/git/trees/master?recursive=1")
  stop_for_status(req)
}

> Error: Forbidden (HTTP 403).

(Note that I don't actually want to run this GET request 10 times, it's just the easiest way I found to simulate my problem. )

Searching a bit online, I found this answer that explains that the Github API requires the GitHub username, or the name of the application, for the User-Agent header value of the request.

But adding user_agent("etiennebacher") in GET() doesn't change anything. How should I specify the user agent in this case?

Also asked on RStudio Community

bretauv
  • 7,756
  • 2
  • 20
  • 57
  • Maybe the user-agent is something else: https://rdrr.io/cran/httr/man/user_agent.html . I seem to remember the agent being the name of softwares. E.g. Mozilla or chrome. – IRTFM Aug 24 '21 at 15:27
  • Not necessarily, e.g `GET(url = "http://httpbin.org/user-agent")` gives `"user-agent": "libcurl/7.64.1 r-curl/4.3.2 httr/1.4.2"`. But we can also customize them, e.g `GET(url = "http://httpbin.org/user-agent", user_agent("hello"))` gives `"user-agent": "hello"`. But for some reason I can't make the github api works with that – bretauv Aug 24 '21 at 15:40
  • I don’t know the answer but I would have tried that libcurl specification or one of the other two. – IRTFM Aug 24 '21 at 15:42
  • 1
    Is the error you are getting only "Forbidden"? Or does it actually return a specific message about the "user agent". The GET request does send a default user agent and the `httr::user_agent()` function does change that when included in the `GET("http:///", user_agent("whatever"))`. Have you done any form of authentication? Unauthenticated request are limited to a few requests per hour. – MrFlick Aug 24 '21 at 17:33
  • @MrFlick Indeed the message is `API rate limit exceeded...`, which could be solved by authenticating. I also found the package `gh` that makes it a bit easier to authenticate (via `gh::gh_whoami()`). That solves it locally, but I don't know how to authenticate in Github actions (which is where the code above is supposed to run). I read [this page](https://docs.github.com/en/actions/reference/authentication-in-a-workflow) but I don't know how to pass `GITHUB_TOKEN` as an input in an R function yet. I'll open another question for that if I can't find a way. – bretauv Aug 24 '21 at 19:01
  • Well your question didn't mention a GitHub actions context so that wasn't clear. But basically basically yeah, you need to use the secrets stuff. There's an example at the bottom of the [encrypting secrets page](https://docs.github.com/en/actions/reference/encrypted-secrets) that shows how you can pass values as environment variables to your step. Then you can use `Sys.getenv()` in your R script to access that environment variable. – MrFlick Aug 24 '21 at 19:27
  • Yes I know I forgot to mention it in the post. I'm gonna write an answer for the original question based on your comments, and then add the github actions part as a "bonus" as soon as I found how to do – bretauv Aug 24 '21 at 19:47

1 Answers1

2

As commented by @MrFlick, the HTTP 403 here means that the API rate limit was exceeded. A way to solve this is to authenticate when making the request, because the rate limit goes from 60 to 5,000 requests per hour. It is easier to do so by using the package gh (and its eponymous function). This can be done locally by specifying .token = <PAT> (where <PAT> is your GitHub Personal Access Token) in gh().

To obtain the list of files in a particular repo, you can save the output of gh() in a JSON file, and then read it with jsonlite::fromJSON(). It is then trivial to get the list of files.

Bottom line, this works:

library(gh)
library(jsonlite)

tmp_file <- tempfile(fileext = ".json")

gh::gh("https://api.github.com/repos/etiennebacher/tidytuesday/git/trees/master?recursive=1", .destfile = tmp_file, .token = <PAT>)

jsonlite::fromJSON(tmp_file)$tree$path

Bonus: how to authenticate in Github Actions

I didn't say it in the original post, but this GET request was supposed to be made in a Github action. Since you can't manually provide the PAT as a token, you need to define the token in the .yaml file, and then pass its value in the .R file.

test.R

library(gh)
library(jsonlite)

tmp_file <- tempfile(fileext = ".json")

gh::gh("https://api.github.com/repos/etiennebacher/tidytuesday/git/trees/master?recursive=1", .destfile = tmp_file, .token = github_token)

jsonlite::fromJSON(tmp_file)$tree$path

GitHub action:

on:
  push:
    branches: master

jobs:
  build:
    runs-on: macOS-latest
    steps:
      - uses: actions/checkout@v2
      - uses: r-lib/actions/setup-r@master
      - run: |
          github_token <- "${{ secrets.GITHUB_TOKEN }}"
          install.packages(c("gh", "jsonlite"))
          source("test.R"),
        shell: Rscript {0}
bretauv
  • 7,756
  • 2
  • 20
  • 57