I'm the lead student-researcher on a team trying to analyze and mine GitHub repositories. We're trying to get (the repo_owner and repo_name) for every project hosted on Github that meets the following criteria:
query MyQuery {
search(query: "language:Python", type: REPOSITORY,
first: 100
) {
pageInfo {
endCursor
hasNextPage
}
edges {
node {
... on Repository {
nameWithOwner
issues {
totalCount
}
defaultBranchRef {
target {
... on Commit {
history(first: 0) {
totalCount
}
}
}
}
}
}
}
}
}
We are able to iterate through the cursors 10 times. But when we reach cursor "Y3Vyc29yOjEwMDA="
query MyQuery {
search(query: "language:Python", type: REPOSITORY,
first: 100, after:"Y3Vyc29yOjEwMDA="
) {
pageInfo {
endCursor
hasNextPage
}
edges {
node {
... on Repository {
nameWithOwner
issues {
totalCount
}
defaultBranchRef {
target {
... on Commit {
history(first: 0) {
totalCount
}
}
}
}
}
}
}
}
}
We get the following response:
{
"data": {
"search": {
"pageInfo": {
"endCursor": null,
"hasNextPage": false
},
"edges": []
}
}
}
I know from a quick advanced search on Github that there are currently ~4,000,000 python-language public repositories hosted on the site. We can only get 1000 before we encounter this null cursor.
Please let us know if there is a work-around for this problem. We'd like to continue to use v4 API because of the minimalistic data output (i.e., it only gives us what we want: repo_owner and repo name along with issue count and commit count).
Thank you for your help!