2

Hello So I am using get request to github api:

commitsPublic <- GET("https://<host-name>/api/v3/search/commits?
q=is:public+org:<ORG-NAME>",
add_headers(Authorization= "token <your-Token>", Accept= 
'application/vnd.github.cloak-preview'))

commitsPublic

And I get:

156 Items (Total Count)

 Content-Type: application/json; charset=utf-8
  Size: 291 kB
{
  "total_count": 156,
  "incomplete_results": false,
  "items": [
    {
  (I removed the Items Since they are not important)

But when I try to convert the Json:

Shows raw data which is not structured and readable

jsonRespText<-content(commitsPublic,as="text") 
jsonRespText

Convert to Json:

toJson <- fromJSON(jsonRespText)
toJson

Returns:

$items[[30]]$score
[1] 1

It returns a List with items up to the number 30 " items[[30]]

and items[[31]] is NULL

So what I am asking is how can I get the 156 Listed items all of the list from Json text. I have another get request which gives me 10.000 Total Commits. But when I convert from Json the list has 30 still. So any help would be grateful thanx!

Alex Rika
  • 141
  • 2
  • 14

1 Answers1

2

The Github API paginates to 30 results. Pagination information is in the response Link header.

library(httr)

commitsPublic <- GET("https://api.github.com/search/commits?q=is:public+org:rstudio",
                     add_headers(Accept = 'application/vnd.github.cloak-preview'))

headers(commitsPublic)$link
#> [1] "<https://api.github.com/search/commits?q=is%3Apublic+org%3Arstudio&page=2>; 
#> rel=\"next\", 
#> <https://api.github.com/search/commits?q=is%3Apublic+org%3Arstudio&page=34>; 
#> rel=\"last\""

Created on 2019-03-22 by the reprex package (v0.2.1)

This tells us where the next page is located, and that there are 34 pages in total in this instance.

Reference: https://developer.github.com/v3/guides/traversing-with-pagination/

Aurèle
  • 12,545
  • 1
  • 31
  • 49
  • I see. But I have a git request with 100.000 Total count. I have to like do this 300 times? – Alex Rika Mar 22 '19 at 09:44
  • Basically, yes. There's also a `per_page=` parameter that can be increased to 100. – Aurèle Mar 22 '19 at 10:08
  • You might want to have a look at the [`gh` package](https://github.com/r-lib/gh) whose `gh()` function has a `.limit` parameter that can be set to `Inf` and does this for you – Aurèle Mar 22 '19 at 10:11
  • Per page works but for 100k I need to do x1000 :D. Do you know how the gh() function works? – Alex Rika Mar 22 '19 at 10:37
  • You can see how it's done at https://github.com/r-lib/gh/blob/master/R/pagination.R . What is the problem with making 1000 calls though? The Search API has a custom rate limit of 30/hour https://developer.github.com/v3/rate_limit/#understanding-your-rate-limit-status (might be different on your instance of GH Enterprise, I'm not familiar with it). That's a 30 minutes job. – Aurèle Mar 22 '19 at 10:46
  • I dont understand how it works.. The limit parameter unfortunately... Well I have much data and I have to work with them so how can I work with them if they are thousands at the same time? – Alex Rika Mar 22 '19 at 10:56
  • If you can figure out how the limit parameter works I would really appreciate it! – Alex Rika Mar 22 '19 at 11:05
  • There's another problem anyway that I hadn't realised at first, it's the _overall_ limit (including pagination) of 1000 results. Though, again, it seems configurable in GitHub Enterprise (if that's what you're using), you might need to reach out to your administrator about that. (There seems to be a workaround with specifying date ranges to make multiple searches) – Aurèle Mar 22 '19 at 11:17
  • 1
    Yeah I am using Github Enterprise. I am an administrator of the Organization. I am doing a project of analysing the commits of students in the University. But the commits are from 330 repositories and from many students and the commits are like 100 Thousands. So is search/org:orgname/commits the right way I am looking for commits? Or is there other way to get all the commits of all the repos within an organization. And ofcourse to get them all as one big list. Which finally I have to convert to data.table to CSV. To analyse it. – Alex Rika Mar 22 '19 at 11:28
  • Thoughts: (1) have a look at https://help.github.com/en/enterprise/2.16/admin/migrations/exporting-the-githubcom-organizations-repositories, which might be very big, and mine it directly. Thoughts: (2) have a look at the Github graphql v4 API and write queries that do some of the aggregation directly – Aurèle Mar 22 '19 at 11:33
  • You could avoid the Search API with sth like `set_config(verbose()) ; org <- "rstudio" ; repos <- gh("https://api.github.com/orgs/:org/repos", org = org, .limit = Inf) ; res <- repos %>% map(~ gh("https://api.github.com/repos/:owner/:repo/commits", owner = org, repo = .$name, .limit = Inf))` – Aurèle Mar 22 '19 at 13:21
  • Seems doable but where do I pass the authantication token? – Alex Rika Mar 22 '19 at 13:38
  • You may add `.send_headers = c(Authorization = "token ")` as a `gh()` parameter – Aurèle Mar 22 '19 at 14:06
  • I tried it but for some reason R doesnt respond. It just show the text I try to run and thats it no response. – Alex Rika Mar 24 '19 at 18:55
  • when I use the gh() function – Alex Rika Mar 24 '19 at 18:59