4

In my application I am making authenticated requests to the GitHub search API with a token. I am making a request every 2s to stay within the primary rate limit of 30 reqs per minute (so not concurrently) and I am also validating every request with the GitHub rate limit API before I make the actual search API call.

Even in the rare case of accidental concurrent requests, they are not likely to be for the same token.

I seem to be following all the rules mentioned in the Primary and Secondary best practices documentation. Despite this my application keeps getting secondary rate limited and I have no idea why. Could anyone help me with why this may be happening?

EDIT:

Sample code:

const search = async function(query, token) {
    var limitResponse;
    try {
        limitResponse = JSON.parse(await rp({
            uri: "https://api.github.com/rate_limit",
            headers: {
                'User-Agent': 'Request-Promise',
                'Authorization': 'token ' + token
            },
            timeout: 20000
        }));
    } catch (e) {
        logger.error("error while fetching rate limit from github", token);
        throw new Error(Codes.INTERNAL_SERVER_ERROR);
    }
    if (limitResponse.resources.search.remaining === 0) {
        logger.error("github rate limit reached to zero");
        throw new Error(Codes.INTERNAL_SERVER_ERROR);
    }
    try {
        var result = JSON.parse(await rp({
            uri: "https://api.github.com/search/code",
            qs: {
                q: query,
                page: 1,
                per_page: 50
            },
            headers: {
                'User-Agent': 'Request-Promise',
                'Authorization': 'token ' + token
            },
            timeout: 20000
        }));
        logger.info("successfully fetched data from github", token);
        /// process response
    } catch (e) {
        logger.error("error while fetching data from github" token);
        throw new Error(Codes.INTERNAL_SERVER_ERROR);
    }
};

Sample Architecture:

A query string (from a list of query strings) and the appropriate token to make the API call with is inserted into a rabbitmq x-delayed queue, with a delay of index*2000s per message (hence they are spaced out by 2s) and the function above is the consumer for that queue. When the consumer throws an error, the message is nack'd and sent to a dead letter queue.

const { delayBetweenMessages } = require('../rmq/queue_registry').GITHUB_SEARCH;
await __.asyncForEach(queries, async (query, index) => {
    await rmqManager.publish(require('../rmq/queue_registry').GITHUB_SEARCH, query, {
            headers: { 'x-delay': index * delayBetweenMessages }
    })
})
Soumik Sur
  • 115
  • 1
  • 1
  • 10
  • 1
    Could be that you're actually not within the limit, but without seeing any code your guess is as good as mine. – Phix Nov 19 '21 at 05:46
  • Where's the code that meters out all the requests and keeps you below the rate limit? That's the code you need to show us. – jfriend00 Nov 19 '21 at 06:08
  • @Phix I've added some sample code and architecture for context – Soumik Sur Nov 19 '21 at 06:18
  • @jfriend00 Added the code. I have verified this works and the logs (of both success and error cases) are spaced 2s apart. I've tried w various intervals and the delay is being respected. – Soumik Sur Nov 19 '21 at 06:20
  • 2
    A calculated delay like this can easily cause problems. If you happened to call whatever function this is in twice, then you'd be starting a new delay calculation that doesn't take into account the items that have already been scheduled causing items from the two calls to overlap and thus exceed your limit. Calculated delays are not the best way to implement rate limit. You want to have a single queue and control when the next item it taken out of the queue centrally, not with precalculated delays. Also, what is `__.asyncForEach()` and what is its purpose here? – jfriend00 Nov 19 '21 at 14:30

2 Answers2

11

Looks like there is not an issue in your code. I was just surfing from my browser and was using github search bar, and I hit secondary rate limit from browser by just surfing. So, looks like search API is internally using concurrency. So, might be it is github's own bug.

Munir Khakhi
  • 170
  • 10
2

You hardcoded a sleep time of 2s, but, according to the documentation, when you trigger the secondary api rate limit you have to wait a time same as the one indicated in the Retry-After attribute of the response header.

mrosa
  • 33
  • 5