3

My goal is to track the total number of stars of my repo. However, its repo.name changed over time. How to achieve this with the githubarchive dataset?

Steren
  • 7,311
  • 3
  • 31
  • 51

1 Answers1

2

(related to https://stackoverflow.com/a/42930963/132438)

GitHub project names go through changes, so instead of querying by name it's safer to query by id. You could look for a project id in a separate query, or do it altogether in a query like this:

SELECT 
  COUNT(*) naive_count,
  COUNT(DISTINCT actor.id) unique_by_actor_id, 
  COUNT(DISTINCT actor.login) unique_by_actor_login 
FROM `githubarchive.month.*` 
WHERE repo.id = (
  SELECT repo.id 
  FROM `githubarchive.month.201702` 
  WHERE repo.name='bazelbuild/bazel' 
  LIMIT 1)
AND type = "WatchEvent"
Community
  • 1
  • 1
Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325