0

We need to export a zip file, containing lots of data (a couple of gb). The zip archive needs to contain about 50-100 indesign files (each about 100mb) and some other smaller files. We try to use google cloud functions to achieve it (less costs etc.) The function is triggered via a config file, which is uploaded into a bucket. The config file contains all information which files needs to be put into the zip. Unfortunately the memory limit of 2gb is always reached, so the function never succeeds.

We tried different things: First solution was to loop over the files, create promises to download them and after the loop is done we tried to resolve all promises at once. (files are downloaded via streaming directly into a file). Second try was to await every download inside the for loop, but again, memory limit reached.

So my question is: Why does node js not clear the streams? It seems like node keeps every streamed file in memory and finally crashes. I already tried to set the readStream and writeStream to null as suggested here:

How to prevent memory leaks in node.js?

But no change.

Note: We never reached the point, there all files are downloaded to create the zip file. It always failed after the first files.

See below the code snippets:

// first try via promises all:
const promises = []
for (const file of files) {
    promises.push(downloadIndesignToExternal(file, 'xxx', dir));
}

await Promise.all(promises)


// second try via await every step (not performant in terms of execution time, but we wanted to know if memory limit is also reached:
for (const file of files) {
    await downloadIndesignToExternal(file, 'xxx', dir);
}


// code to download indesign file
function downloadIndesignToExternal(activeId, externalId, dir) {
  return new Promise((resolve, reject) => {
    let readStream = storage.bucket(INDESIGN_BUCKET).file(`${activeId}.indd`).createReadStream()
    let writeStream = fs.createWriteStream(`${dir}/${externalId}.indd`);
    readStream.pipe(writeStream);
    writeStream.on('finish', () => {
      resolve();
    });
    writeStream.on('error', (err) => {
      reject('Could not write file');
    })
  })
}
Marco
  • 1,579
  • 1
  • 19
  • 34

2 Answers2

2

It's important to know that /tmp (os.tmpdir()) is a memory-based filesystem in Cloud Functions. When you download a file to /tmp, it is taking up memory just as if you had saved it to memory in a buffer.

If your function needs more memory than can be configured for a function, then Cloud Functions might not be the best solution to this problem.

If you still want to use Cloud Functions, you will have to find a way to stream the input files directly to the output file, but without saving any intermediate state in the function. I'm sure this is possible, but you will probably need to write a fair amount of extra code for this.

Doug Stevenson
  • 297,357
  • 32
  • 422
  • 441
  • That makes sense! I don't think it will work for us in this case, because we want to upload the zip file in a bucket. Thanks! – Marco Dec 11 '19 at 23:01
  • 1
    You certainly can stream in and out of zip in memory. I saw that there is module in npm that can do it. – Doug Stevenson Dec 11 '19 at 23:02
0

For anyone interested: We got it working by streaming the files into the zip and streaming it directly into google cloud storage. Memory usage is now by around 150-300mb, so this works perfectly for us.

Marco
  • 1,579
  • 1
  • 19
  • 34