My goal is to track over time the popularity of my BigQuery repo.
I want to use publicly available BigQuery datasets, like GitHub Archive or the GitHub dataset
The GitHub dataset sample_repos
does not contain a snapshot of the star counts:
SELECT
watch_count
FROM
[bigquery-public-data:github_repos.sample_repos]
WHERE
repo_name == 'angular/angular'
returns 5318.
GitHub Archive is a timeline of event. I can try to sum them all, but the numbers do not match with the numbers in the GitHub UI. I guess because it does not count unstar actions. Here is the query I used:
SELECT
COUNT(*)
FROM
[githubarchive:year.2011],
[githubarchive:year.2012],
[githubarchive:year.2013],
[githubarchive:year.2014],
[githubarchive:year.2015],
[githubarchive:year.2016],
TABLE_DATE_RANGE([githubarchive:day.], TIMESTAMP('2017-01-01'), TIMESTAMP('2017-03-30') )
WHERE
repo.name == 'angular/angular'
AND type = "WatchEvent"
returns 24144
The real value is 21,921