78

Is there a way to get access to the data in the “Repositories contributed to” module on GitHub profile pages via the GitHub API? Ideally the entire list, not just the top five, which are all you can get on the web apparently.

outoftime
  • 2,190
  • 1
  • 16
  • 17
  • 7
    No easy way to do it, I believe. Digging through the data available in the (Unofficial) GitHub Archive project will help (but only for public projects): http://www.githubarchive.org/ – Ivan Zuzak Dec 21 '13 at 10:57
  • Interested to know how to do it in Javascript specifically. The repos should not only include repos that one has commits to, but should also include repos with one's issue opening and comments and so on. I don't have a clear way in my mind. – Xiaodong Qi Mar 08 '16 at 06:47
  • You need to make a lot of queries to figure out the result. The rules GitHub use to determine if something can be counted as a contribution are here: https://help.github.com/articles/why-are-my-contributions-not-showing-up-on-my-profile/#contributions-that-are-counted – Xiaodong Qi Mar 08 '16 at 16:49

13 Answers13

39

With GraphQL API v4, you can now get these contributed repo using :

{
  viewer {
    repositoriesContributedTo(first: 100, contributionTypes: [COMMIT, ISSUE, PULL_REQUEST, REPOSITORY]) {
      totalCount
      nodes {
        nameWithOwner
      }
      pageInfo {
        endCursor
        hasNextPage
      }
    }
  }
}

Try it in the explorer

If you have more than 100 contributed repo (including yours), you will have to go through pagination specifying after: "END_CURSOR_VALUE" in repositoriesContributedTo for the next request.

Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
  • 1
    Now that we're here in the future (2017) the best solution to this question is to use GitHub's new GraphQL API instead of the 2014 era solutions that depend on the githubarchive Google BigQuery. – gene_wood Jan 19 '18 at 23:33
  • 2
    Strangely enough, doesn't show my own projects… but cool solution! – lapo May 09 '18 at 13:41
  • Is it possible to get data as public? – eQ19 May 29 '18 at 09:42
  • 9
    This looks nice, but the documentation says “A list of repositories that the user *recently* contributed to.” (emphasis mine). Also, missing own projects. – Joachim Breitner Sep 07 '18 at 08:25
  • 3
    Own projects can be included using ` includeUserRepositories:true` – Joachim Breitner Sep 07 '18 at 08:26
  • This won't work well if you are trying to implement it as a script or from a program, however, as you need to login to GitHub. Any other suggestions? – Destaq Jun 16 '20 at 16:15
  • 1
    @Mythaar you can use a [personal access token](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line), for a python script example see [this](https://stackoverflow.com/a/49291041/2614364) – Bertrand Martel Jun 16 '20 at 16:35
  • 1
    To get (public) contributions of another user, substitute `viewer` with `user(login: "username")` – jrieke Dec 23 '20 at 18:29
  • 1
    @BertrandMartel the source is now rotten (https://github.community/t5/GitHub-API-Development-and/bd-p/api), do you have any other link or should we juste remove it ? (I've searched the wayback machine, no article linked there either..) – Ulysse BN May 21 '21 at 10:49
  • I've writteng a tiny jq script to list urls of my contributions from the result of this request. May it be useful to others: `jq -r '.data.viewer.repositoriesContributedTo.nodes[] | @html "https://github.com/\(.nameWithOwner)/pulls?q=is:pr+author:BuonOmo"' contributed.json` – Ulysse BN May 21 '21 at 11:11
  • @UlysseBN thanks, I thought that Github would migrate the old plaform.github.community website (presently github.community forum) but they've never done it, so I'm removing this source link from the answer: https://platform.github.community/t/deprecation-and-replacement-of-user-contributedrepositories/4211 – Bertrand Martel May 21 '21 at 19:35
  • can it list how many contributions made? – qwr Oct 10 '21 at 06:50
  • If I want to list only repositories with mine commits, it will not include forked repositories. `COMMIT` + `PULL_REQUEST` excludes forked repositories with mine commits. `COMMIT` + `PULL_REQUEST` + `REPOSITORY` includes repositories with mine issues. `Type` -> `Sources` from `Your repositories` tab still does not show it either. – Andry Oct 28 '21 at 00:15
  • 2
    Missing older contributions - Docs say “A list of repositories that the user _recently_ contributed to.” If you discover a way to find all contributions (old and recent) please post. – Mike Vosseller Nov 15 '21 at 15:03
33

Using Google BigQuery with the GitHub Archive, I pulled all the repositories I made a pull request to using:

SELECT repository_url 
FROM [githubarchive:github.timeline]
WHERE payload_pull_request_user_login ='rgbkrk'
GROUP BY repository_url;

You can use similar semantics to pull out just the quantities of repositories you contributed to as well as the languages they were in:

SELECT COUNT(DISTINCT repository_url) AS count_repositories_contributed_to,
       COUNT(DISTINCT repository_language) AS count_languages_in
FROM [githubarchive:github.timeline]
WHERE payload_pull_request_user_login ='rgbkrk';

If you're looking for overall contributions, which includes issues reported use

SELECT COUNT(DISTINCT repository_url) AS count_repositories_contributed_to,
       COUNT(DISTINCT repository_language) AS count_languages_in
FROM [githubarchive:github.timeline]
WHERE actor_attributes_login = 'rgbkrk'
GROUP BY repository_url;

The difference there is actor_attributes_login which comes from the Issue Events API.

You may also want to capture your own repos, which may not have issues or PRs filed by yourself.

Kyle Kelley
  • 13,804
  • 8
  • 49
  • 78
  • 3
    Since January 2015, `githubarchive:github.timeline` table has been deprecated. – sulaiman sudirman Jun 28 '16 at 14:04
  • 2
    In addition to what @sulaiman points out about the table deprecation, the table structure of the replacement tables has changed completely (e.g. table `githubarchive:year.2017`) such that a current query would look like : `SELECT repo.name FROM [githubarchive:year.2017] WHERE actor.login ='rgbkrk' GROUP BY repo.name;` – gene_wood Jan 19 '18 at 23:30
  • 1
    In addition to @sulaimansudirman and @gene_wood comments: The syntax changed a little, so a current query would be something like this: `SELECT repo.name FROM \`githubarchive.year.2019\` WHERE actor.login ='rgbkrk' GROUP BY repo.name;`. As a side note: one could use an `*` instead of a year. – PF4Public Mar 07 '20 at 12:59
15

I tried implementing something like this a while ago for a Github summarizer... My steps to get the repositories the user contributed to, which they didn't own, was as follows (going to use my own user as an example):

  • Search for that last 100 closed pull requests the user submitted. Of course you could request the second page if the first page is full to get even older prs

https://api.github.com/search/issues?q=type:pr+state:closed+author:megawac&per_page=100&page=1

  • Next I would request each of these repos contributors. If the user in question is in the contributors list we add the repo to the list. Eg:

https://api.github.com/repos/jashkenas/underscore/contributors

  • We might also try checking all the repos the user is watching. Again we would check each repos repos/:owner/:repo/contributors

https://api.github.com/users/megawac/subscriptions

  • In addition I would iterate all the repos of the organizations the user is in

https://api.github.com/users/megawac/orgs
https://api.github.com/orgs/jsdelivr/repos

  • If the user is listed as a contributor to any of the repos there we add the repo to the list (same step as above)

This misses repos where the user has submitted no pull requests but has been added as a contributor. We can increase our odds of finding these repos by searching for

1) any issue opened (not just closed pull requests)
2) repos the user has starred

Clearly, this requires many more requests than we would like to make but what can you do when they make you fudge features \o/

megawac
  • 10,953
  • 5
  • 40
  • 61
  • If you can make your Javascript search for repos that have issues opened and commented by the user, that would be ideal. The rule that GitHub uses to generate their list of repos contributed to is here, but we don't need to follow it too close: https://help.github.com/articles/why-are-my-contributions-not-showing-up-on-my-profile/#contributions-that-are-counted – Xiaodong Qi Mar 08 '16 at 17:05
4

You'll probably get the last year or so via GitHub's GraphQL API, as shown in Bertrand Martel's answer.

Everything that happened back to 2011 can be found in GitHub Archive, as stated in Kyle Kelley's answer. However, BigQuery's syntax and GitHub's API seems to have changed and the examples shown there no longer work in 08/2020.

So here's how I found all repos I contributed to

SELECT distinct repo.name
FROM (
  SELECT * FROM `githubarchive.year.2011` UNION ALL
  SELECT * FROM `githubarchive.year.2012` UNION ALL
  SELECT * FROM `githubarchive.year.2013` UNION ALL
  SELECT * FROM `githubarchive.year.2014` UNION ALL
  SELECT * FROM `githubarchive.year.2015` UNION ALL
  SELECT * FROM `githubarchive.year.2016` UNION ALL
  SELECT * FROM `githubarchive.year.2017` UNION ALL
  SELECT * FROM `githubarchive.year.2018`
)
WHERE (type = 'PushEvent' 
  OR type = 'PullRequestEvent')
  AND actor.login = 'YOUR_USER'

Some of there Repos returned only have a name, no user or org. But I had to process the result manually afterwards anyway.

schnatterer
  • 7,525
  • 7
  • 61
  • 80
3

You can use Search provided by GitHub API. Your query should look something like this:

https://api.github.com/search/repositories?q=%20+fork:true+user:username

fork parameter set to true ensures that you query all user's repos, forked included.

However, if you want to make sure the user not only forked repository, but contributed to it, you should iterate through every repo you got with 'search' request and check if user is within them. Which quite sucks, because github returns only 100 contributors and there is no solution for that...

koscielna
  • 41
  • 4
  • 6
    This only yields the current list of the user's repos, not the list of repos ever contributed to. – barfuin Oct 15 '15 at 20:01
2

I came to the problem. (GithubAPI: Get repositories a user has ever committed in)

One actual hack I've found is that there's a project called http://www.githubarchive.org/ They log all public events starting from 2011. Not ideal, but can be helpful.

So, for example, in your case:

SELECT  payload_pull_request_head_repo_clone_url 
FROM [githubarchive:github.timeline]
WHERE payload_pull_request_base_user_login='outoftime'
GROUP BY payload_pull_request_head_repo_clone_url;

Gives, if I'm not mistaken, the list of repos you've pull requested to:

https://github.com/jreidthompson/noaa.git
https://github.com/kkrol89/sunspot.git
https://github.com/rterbush/sunspot.git
https://github.com/ottbot/cassandra-cql.git
https://github.com/insoul/cequel.git
https://github.com/mcordell/noaa.git
https://github.com/hackhands/sunspot_rails.git
https://github.com/lgierth/eager_record.git
https://github.com/jnicklas/sunspot.git
https://github.com/klclee/sunspot.git
https://github.com/outoftime/cequel.git

You can play with bigquery here: bigquery.cloud.google.com, data schema can be found here: https://github.com/igrigorik/githubarchive.org/blob/master/bigquery/schema.js

Community
  • 1
  • 1
nix
  • 464
  • 1
  • 5
  • 13
2

I wrote a selenium python script to do this

"""
Get all your repos contributed to for the past year.

This uses Selenium and Chrome to login to github as your user, go through 
your contributions page, and grab the repo from each day's contribution page.

Requires python3, selenium, and Chrome with chromedriver installed.

Change the username variable, and run like this:

GITHUB_PASS="mypassword" python3 github_contributions.py
"""

import os
import sys
import time
from pprint import pprint as pp
from urllib.parse import urlsplit
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

username = 'jessejoe'
password = os.environ['GITHUB_PASS']

repos = []
driver = webdriver.Chrome()
driver.get('https://github.com/login')

driver.find_element_by_id('login_field').send_keys(username)
password_elem = driver.find_element_by_id('password')
password_elem.send_keys(password)
password_elem.submit()

# Wait indefinitely for 2-factor code
if 'two-factor' in driver.current_url:
    print('2-factor code required, go enter it')
while 'two-factor' in driver.current_url:
    time.sleep(1)

driver.get('https://github.com/{}'.format(username))

# Get all days that aren't colored gray (no contributions)
contrib_days = driver.find_elements_by_xpath(
    "//*[@class='day' and @fill!='#eeeeee']")

for day in contrib_days:
    day.click()
    # Wait until done loading
    WebDriverWait(driver, 10).until(
        lambda driver: 'loading' not in driver.find_element_by_css_selector('.contribution-activity').get_attribute('class'))

    # Get all contribution URLs
    contribs = driver.find_elements_by_css_selector('.contribution-activity a')
    for contrib in contribs:
        url = contrib.get_attribute('href')
        # Only care about repo owner and name from URL
        repo_path = urlsplit(url).path
        repo = '/'.join(repo_path.split('/')[0:3])
        if repo not in repos:
            repos.append(repo)
    # Have to click something else to remove pop-up on current day
    driver.find_element_by_css_selector('.vcard-fullname').click()

driver.quit()
pp(repos)

It uses python and selenium to automate a Chrome browser to login to github, go to your contributions page, click each day and grab the repo name from any contributions. Since this page only shows 1 year's worth of activity, that's all you can get with this script.

Tunaki
  • 132,869
  • 46
  • 340
  • 423
jjj
  • 767
  • 2
  • 9
  • 14
2

There is a new project that claims to list all contributions:

https://github.com/AurelienLourot/github-contribs

It also backs a service to produce more detailed user profiles:

https://ghuser.io/

Joachim Breitner
  • 25,395
  • 6
  • 78
  • 139
2

You can take a look at https://github.com/casperdcl/ghstat which automates counting lines of code written in all visible repositories. Extracting the relevant code and tidying it up:

  • requires gh from https://github.com/cli/cli
  • requires jq
  • requires bash
  • needs ${GH_USER} env var set
  • defines "contributor" to mean "committer"
#!/bin/bash
ghjq() { # <endpoint> <filter>
  # filter all pages of authenticated requests to https://api.github.com
  gh api --paginate "$1" | jq -r "$2"
}
repos="$(
  ghjq users/$GH_USER/repos .[].full_name
  ghjq "search/issues?q=is:pr+author:$GH_USER+is:merged" \
    '.items[].repository_url | sub(".*github.com/repos/"; "")'
  ghjq users/$GH_USER/subscriptions .[].full_name
  for org in "$(ghjq users/$GH_USER/orgs .[].login)"; do
    ghjq orgs/$org/repos .[].full_name
  done
)"
repos="$(echo "$repos" | sort -u)"
# print repo if user is a contributor
for repo in $repos; do
  if [[ $(ghjq repos/$repo/contributors "[.[].login | test(\"$GH_USER\")] | any") == "true" ]]; then
    echo $repo
  fi
done
casper.dcl
  • 13,035
  • 4
  • 31
  • 32
1

I didn't see any way of doing it in the API. The closest I could find was to get the latest 300 events from a public user (300 is the limit, unfortunately), and then you can sort those for contributions to other's repositories.

https://developer.github.com/v3/activity/events/#list-public-events-performed-by-a-user

We need to ask Github to implement this in their API.

RichLitt
  • 352
  • 1
  • 3
  • 13
  • 1
    The problem is that "Repositories contributed to" on GitHub doesn't just include repositories that you've made commits to, it includes opening issues as well. –  Jun 01 '14 at 01:58
  • @Cupcake opening an issue is considered as a contribution on the github user page – jazzytomato Dec 08 '15 at 00:08
0

I am using python:

import requests
import pandas as pd
import datetime
token='..........................'
g=Github(token,per_page=10000)
repos=g.search_repositories(query="q:example")
  • 1
    A good answer will always include an explanation why this would solve the issue, so that the OP and any future readers can learn from it. – Tyler2P Jan 30 '22 at 20:58
0

Best Effort w/ GitHub v3 + v4 APIs

I created a little shell script that will grab from both of the GitHub v3 (RESTful) and v4 (GraphQL) APIs and page through the results to get as much contribution information as possible:

All of the limitations mentioned by others apply but, if nothing else, it serves of an example of how to page through the results and use a shell function recursively. ‍♂️

Output looks like this:

caddyserver/caddy:
http://www.github.com/caddyserver/caddy/commits?author=coolaj86

watchexec/watchexec:
http://www.github.com/watchexec/watchexec/commits?author=coolaj86

webinstall/webi-installers:
http://www.github.com/webinstall/webi-installers/commits?author=coolaj86

You could take it a step further and look at those commits.

You could also try iterating over ALL authored commits...

For me that's 20k+ commits with my authorship info (including forks and forks of forks of forks) that have to be filtered through to figure out which were the repos I actually contributed to:

coolaj86
  • 74,004
  • 20
  • 105
  • 125
-2

As of now GitHub API v3, doesn't provide a way to get the user's current streak.

You may use this to calculate the current streak.

https://github.com/users/<username>/contributions.json
Alex Pliutau
  • 21,392
  • 27
  • 113
  • 143