0

I'm trying to look up the GitHub username for a few hundred users based on their email (which I pulled from the git log). Unfortunately I can't figure out how to do this without making a single call per email.

How do I look up many GitHub usernames by email in as few queries as possible?

Previous answers that didn't work for me:

Marcin Kłopotek
  • 5,271
  • 4
  • 31
  • 51
Daniel Porteous
  • 5,536
  • 3
  • 25
  • 44

3 Answers3

1

GitHub API doesn't support looking up multiple users by email at once. However, you can minimize the number of requests you need to make by using GitHub's GraphQL API instead of the REST API. This will allow you to retrieve multiple users' information in a single request.

Here's an example script using the GraphQL API to perform multiple email lookups in a single request. It has to be run from the existing GitHub repository directory. It will, first, read the unique list of committers' emails using git log command and then it will build a list of GraphQL queries for each email. The queries will be written to query.json file and passed as an argument to curl command that will execute all of them in a single HTTP call. Finally. jq command is used to parse the response. To run the script, you have to have GITHUB_TOKEN environment variable set. This is required to access Github GraphQL API without limits imposed on anonymous access.

#!/usr/bin/env bash

# more reliable error handling
set -eua pipefail

# read unique emails from git log and store them in an array
read -ra emails <<< "$(git log --format='%ae' | sort -u | xargs)"

# Build the GraphQL query string with one search query per email address
# See https://docs.github.com/en/graphql/reference/queries
query="query {"
for idx in "${!emails[@]}"; do
  query+=" query${idx}: search(query: \\\"in:email ${emails[$idx]}\\\", type: USER, first: 1) { nodes { ... on User { login email } } }"
done
query+=" }"

# Write the GraphQL query to a query.json file
# See https://docs.github.com/en/graphql/overview/resource-limitations
echo "{\"query\": \"$query\"}" > query.json

# Execute the GraphQL query
curl --fail-with-body -sH "Authorization: token $GITHUB_TOKEN" --data @query.json https://api.github.com/graphql |
  # Parse the JSON response and build the email => login mapping
  jq -r '.data | to_entries[] | .value.nodes[] | "\(.email) => \(.login)"'

Keep in mind that there is a limit to the number of simultaneous queries you can send in a single request. If you need to look up more emails, you may have to divide them into smaller chunks and make multiple requests. The exact limit will depend on the rate limits set by GitHub for your account. You can check your rate limits in the API response headers as well.

Please keep in mind the generated GraphQL query will not return the mapping if there's no matching login found for the given email (eg.: the user does not exist anymore)

You can also use the GitHub GraphQL API Explorer to test your queries.

Marcin Kłopotek
  • 5,271
  • 4
  • 31
  • 51
1

You can send GET request with query parameters to get users matched with the username

https://api.github.com/search/users?q=${username}&per_page=${rowsPerPage}&page=${page}
0

Thanks to Marcin with the original answer, here is a version of that code in Javascript + with pagination support.

const PER_PAGE = 100;

async function fetchEmailToUsername() {
  // Read contributor emails from the git log and store them in an array.
  const out = shell.exec('git log --format="%ae" | sort -u', { silent: true });
  const emailsUnfiltered = out.stdout.split("\n").filter(Boolean);

  // Filter out emails ending with @users.noreply.github.com since the first part of
  // that email is the username.
  const emails = emailsUnfiltered.filter((email) => !email.endsWith("@users.noreply.github.com"));

  // To use the GraphQL endpoint we need to provide an auth token.
  const githubToken = getGitHubToken();

  let emailUsernameMap = new Map();

  // Break up the emails in page chunks since fetching them all at once causese
  // the query to fail.
  for (let page = 0; page < emails.length; page += PER_PAGE) {
    const emailChunk = emails.slice(page, page + PER_PAGE);

    // Build the GraphQL query string with one search query per email address in this
    // chunk. See https://docs.github.com/en/graphql/reference/queries
    let query = "query {";
    for (const [idx, email] of emailChunk.entries()) {
      query += ` query${idx}: search(query: "in:email ${email}", type: USER, first: 1) { nodes { ... on User { login email } } }`;
    }
    query += " }";

    const fetchOptions = {
      method: "POST",
      headers: {
        Authorization: `token ${githubToken}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ query }),
    };

    const response = await fetch("https://api.github.com/graphql", fetchOptions);
    const responseBody = await response.json();

    // Parse the JSON response and append to the email => username map.
    const nodes = Object.values(responseBody.data).flatMap((value) => value.nodes);

    for (let i = 0; i < nodes.length; i++) {
      const { email, login } = nodes[i];
      if (!email) {
        continue;
      }
      emailUsernameMap.set(email.toLowerCase(), login);
    }

    console.log(`Fetched ${page + emailChunk.length} usernames out of ${emails.length} emails`);
  }

  return emailUsernameMap;
}
Daniel Porteous
  • 5,536
  • 3
  • 25
  • 44