2

I have a node.js script that copies data from a folder into different shared drives. The source folder is huge, so the script runs for hours. The logic is list folders, check that they correspond to certain criteria (created date is between 2018 and 2021) and then move its contents to a different drive accordingly.

The however, sometimes I get the error mentioned in the subject line.

This page discusses the issue, and I did set process.env.UV_THREADPOOL_SIZE = 128, but this does not seem to have fixed the issue.

I could not find any other discussions specific to the Google Drive API. Other such errors on different APIs seem irrelevant.

The relevant source code is as follows:

async function listFiles() {

    let files = [];
    const drive = _getDrive();

    const doListFiles = async(options, pageToken) => {

      return await new Promise(async (resolve, reject) => {
        
       // ...

        const params = {
         pageSize: 1000,
          fields: `nextPageToken, files(${listFileData?.join(',')})`,
          supportsAllDrives: true,
          q,
          pageToken,
          trashed: false,
         };

        drive.files.list(params, async (err, res) => {
          if (err) {
            reject(err);
            return console.error('Could not list files:', err);
          }
          const _files = res.data.files;
          if (!_files.length) {
            resolve(_files);
          }
          files = [...files, ..._files];
          if (res.data.nextPageToken) {
            return resolve(await retry(doListFiles, options, res.data.nextPageToken));
          }
          return resolve(files);
        });
      });
  };

  try {
    return await retry(doListFiles, options);
  } catch (err) {
    console.error(err);
    console.log({ isError: true, errMessage: err, ...file });
    return [];
  }
}

Detailed error message:

Could not list files: FetchError: request to https://www.googleapis.com/drive/v3/files?pageSize=1000&fields=nextPageToken%2C%20files%28id%2Cname%2CmimeType%2CcreatedTime%2Cparents%29&supportsAllDrives=true&q=%271EnHp6QRQK9A7XSzvDdU2H4GOpp1qgF8G%27%20in%20parents%20and%20mimeType%20%3D%20%27application%2Fvnd.google-apps.folder%27&trashed=false failed, reason: getaddrinfo EAI_AGAIN www.googleapis.com
    at ClientRequest.<anonymous> (/home/wildhog/Documents/clients/xxxxxxxxx/node_modules/node-fetch/lib/index.js:1491:11)
    at ClientRequest.emit (node:events:537:28)
    at TLSSocket.socketErrorListener (node:_http_client:465:9)
    at TLSSocket.emit (node:events:537:28)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  type: 'system',
  errno: 'EAI_AGAIN',
  code: 'EAI_AGAIN',
  config: {
    url: 'https://www.googleapis.com/drive/v3/files?pageSize=1000&fields=nextPageToken%2C%20files%28id%2Cname%2CmimeType%2CcreatedTime%2Cparents%29&supportsAllDrives=true&q=%271EnHp6QRQK9A7XSzvDdU2H4GOpp1qgF8G%27%20in%20parents%20and%20mimeType%20%3D%20%27application%2Fvnd.google-apps.folder%27&trashed=false',
    method: 'GET',
    userAgentDirectives: [ [Object] ],
    paramsSerializer: [Function (anonymous)],
    headers: {
      'x-goog-api-client': 'gdcl/6.0.0 gl-node/18.4.0 auth/8.1.0',
      'Accept-Encoding': 'gzip',
      'User-Agent': 'google-api-nodejs-client/6.0.0 (gzip)',
      Authorization: 'Bearer ya29.c.b0AXv0zTPyF3jQwz-LOYiyLGoeTSWDazYmqOO0OKq4IR9PZ7Yu89qddhwWZ5nXndyTrjoKFNkuBZHYa9f_txe-ZHByNq39wm60s2AgwwxMTH-fsfXlIXnIR4F0DYPSff8PEs7mcquyedybSi1EVnhTYATFfa_ZrXv-rkTQ-j9ayHnqCJCO46AhvFDO20i0zIXP5pqxSYKOU-Vovl-mQG0nhrnGGhzLc1Ca.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................',
      Accept: 'application/json'
    },
    params: {
      pageSize: 1000,
      fields: 'nextPageToken, files(id,name,mimeType,createdTime,parents)',
      supportsAllDrives: true,
      q: "'1EnHp6QRQK9A7XSzvDxxxxxxxxxxxx' in parents and mimeType = 'application/vnd.google-apps.folder'",
      trashed: false
    },
    validateStatus: [Function (anonymous)],
    retry: true,
    responseType: 'json',
    retryConfig: {
      currentRetryAttempt: 2,
      retry: 3,
      httpMethodsToRetry: [Array],
      noResponseRetries: 2,
      statusCodesToRetry: [Array]
    }
  }
}

All help is greatly appreciated.

Update

@Tanaike asked for the source code of the retry() function. It implements exponential back off, works fine and is not at the root of the problem, but here goes the code if it makes things simpler.

import logger from "./logger.js";
const sleep = ms => new Promise(resolve => setTimeout(resolve, ms));

const retry = async (fun, ...args) => {
  const maxTries = 5;
  let numTries = 0;
  const maxDuration = 64000;

  while (numTries < maxTries) {
    try {
      logger.info('Trying', numTries + 1, Date.now(), fun);
      const funResult = await fun(...args);
      logger.info(
        ' ~ file: retry.js ~ line 14 ~ retry ~ fun, funResult',
        fun,
        funResult
      );
      return funResult;
    } catch (err) {
      if (!/rate limit exceeded/i.test(err.toString()))
        return { isError: true, errMessage: err, ...args[0] };

        let sleepDuration = 2 ** numTries * 1000 + Math.random() * 1000;
        if (sleepDuration > maxDuration) sleepDuration = maxDuration;
      // let sleepDuration = 0;    

      logger.info(
        `Error: ${err}`,
        'Will retry after',
        sleepDuration,
        ...args
      );

      await sleep(sleepDuration);
      numTries++;
    }
  }

  return { isError: true, errMessage: 'Too many tries', ...args[0] };
};
Dmitry Kostyuk
  • 1,354
  • 1
  • 5
  • 21
  • Does this answer your question? [can't resolve getaddrinfo EAI\_AGAIN error](https://stackoverflow.com/questions/70786685/cant-resolve-getaddrinfo-eai-again-error) – Linda Lawton - DaImTo Sep 06 '22 at 17:44
  • @DaImTo No, the reason cited there is irrelevant to Google Drive (*the problem is caused because the Binance's web socket server requires to respond to a ping frame within 10 minutes or the connection*) – Dmitry Kostyuk Sep 06 '22 at 17:48
  • Possible duplicate https://stackoverflow.com/questions/40182121/whats-the-cause-of-the-error-getaddrinfo-eai-again – esqew Sep 11 '22 at 23:59
  • In your showing script, what is `retry`? And also, I think that for example, in your showing script, `new Promise(` is not enclosed by `)`. So, can you provide the script for correctly replicating your current issue? – Tanaike Sep 12 '22 at 00:05
  • @esqew No, please read the question carefully as well my reply to the first comment – Dmitry Kostyuk Sep 12 '22 at 11:20
  • Hi @Tanaike, happy to have you onboard this question! `new Promise` seems to have all the parenthesis needed. As for the `retry()` it implements exponential back off. It works fine and it's not at the root of the problem, but I will put in the qustion if it helps clarity. – Dmitry Kostyuk Sep 12 '22 at 11:23
  • Thank you for replying. About `new Promise seems to have all the parenthesis needed. As for the retry() it implements exponential back off. It works fine and it's not at the root of the problem, but I will put in the qustion if it helps clarity.`, when I checked your showing script of `listFiles()`, it seems that `)` is missed. So, I'm worried that you might have miscopied your script. How about this? – Tanaike Sep 12 '22 at 12:55
  • @Tanaike Thanks for noticing, I fixed it. However, this is because I was trying to reduce the source code to the bare minimum. Sometimes it runs OK, sometimes it runs for a while and then errors out. If there was a syntax error in production, it would not run at all. – Dmitry Kostyuk Sep 12 '22 at 17:57
  • Thank you for replying. When I tested your script, unfortunately, I couldn't replicate your situation. For example, `options` is not declared. And, `file` of `console.log({ isError: true, errMessage: err, ...file });` is not declared. But from `Sometimes it runs OK, sometimes it runs for a while and then errors out.`, I'm worried that you might have miscopied your script. Can you provide the script for correctly replicating your current issue? I think that the reason that I cannot replicate your issue from your showing script is due to my poor skill. I have to apologize for this. – Tanaike Sep 13 '22 at 01:45
  • @Tanaike The reason I didn't do it is that there would be a lot of code, I am going to see if I can make it manageable. But I take it you've never had this errer? – Dmitry Kostyuk Sep 15 '22 at 07:59
  • Thank you for replying. In my experience, I have never had the same issue with you. So, I have tried to correctly replicate your situation using your provided script. By this, I thought that the reason for this issue might be able to be thought. I think that this is due to my poor experience. I apologize for this. – Tanaike Sep 15 '22 at 12:09
  • @Tanaike I know your experience is excellent and I am an admirer of your work, it's just that my example was difficult to replicate. I do think P.T.'s explanation is rather good though. – Dmitry Kostyuk Sep 19 '22 at 15:34

1 Answers1

1

I don't think the problem is specific to Google Drive API. The problem is the local client DNS resolution of the www.googleapis.com domain name in the URL, which happens before the HTTP request is sent off to Google. (The reason: getaddrinfo EAI_AGAIN www.googleapis.com in the error message.) Because DNS is a network protocol, it is susceptible to timeouts and transient errors too. These transient errors should be retried.

It looks like your retry logic only applies to "rate limit exceeded" errors? Perhaps expand that to also include err.errno === 'EAI_AGAIN' errors? Bump your retry counts and back-offs to see if that helps too.

Also, double-check that your retry logic isn't causing the problem. From the the point of view of a server that is already busy, client-side retries can simply exacerbate problems if they issue follow-on requests too quickly, or create additional network stress when network stress is the source of the transient errors.

The Google client side libraries have their own built-in retry logic. (I believe it boils down to this https://github.com/googleapis/gaxios/blob/main/src/gaxios.ts#L168, and is what the retryConfig shown in the error report is for.) I don't think the Drive APIs expose these retry configuration knobs to clients. But be aware that there are some retries happening inside the library for you already. (They can only do so much, because at some level they can't know what is an application/logic error and what is a transient error.)

P.T.
  • 24,557
  • 7
  • 64
  • 95
  • Thanks @P.T. this is useful! I think my retry logic is fine, but I will definately include retries for the DNS error. Could this be called by a faulty internet connection? I've had this error on my local machine, not on Cloud Compute. – Dmitry Kostyuk Sep 19 '22 at 15:32
  • This is probably not the sort of problem that you can prevent 100%, especially not a busy system. Transient network issues happen (especially for UDP-based DNS requests) so the occasional loss is expected, and if you're busy enough very occasionally you'll lose multiple requests, or they're delayed too long to be useful. If you've got NodeJS code that copies Drive Files around, you might use AppsScripts to host it? – P.T. Sep 21 '22 at 15:35