Write data to Firestore Concurrently in Bulk

Question

I have a use case where I need to generate a pre-calculated document and save it to a firestore collection.

The approach is to create a tasks collection, and each document is tasks will be used to trigger the onCreate event in cloud functions.

Each onCreate event will take ~40s to finish then write to a samples collection.

The number of tasks is 244 * 26 = 6344 documents that need to be written the samples collection.

Here are the steps that clouds function triggers.

Step 1: Create 244 documents in tasks (run every 1 hour)

Step 2: onCreate will listen to the event -> generate documents which takes ~40s each. It means we have 244 functions concurrently running and writing 26 documents document to samples collection.

The function that I use to write data

export const generateData = async () => {
  const promises = []
  for (const sample of samples) {
    // some logics
    promises.push(sampleRef.set(sampleData))
  }
  await Promise.all(promises)
  return
}

This is the error that I got:

Error: 4 DEADLINE_EXCEEDED: Deadline exceeded
    at Object.callErrorFromStatus (/workspace/node_modules/@grpc/grpc-js/build/src/call.js:31:26)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client.js:179:52)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:336:141)
    at Object.onReceiveStatus (/workspace/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:299:181)
    at /workspace/node_modules/@grpc/grpc-js/build/src/call-stream.js:145:78
    at processTicksAndRejections (internal/process/task_queues.js:77:11)

Any thoughts on what happened or are there other ways to do it? Thanks,

score 2 · Accepted Answer · answered Oct 22 '21 at 07:27

There can be quite a few reasons for this error :

It can be related to @grpc/grpc-js as the error suggests. A lot of @google-cloud libraries with a @grpc/grpc-js dependency have an open issue related to concurrency. The only thing I've read that helps people seems to be :
```
   Upgrading your Node version to v13( assuming you are using node as the runtime). 
```
See this for more details.
This may be an issue for dispatching multiple tasks concurrently. Temporary solution is : Submit tasks sequentially. One by one. Possible workarounds :

Setting the flag fallback : true while creating tasks has resolved the issue for some.
```
 const {CloudTasksClient} = require('@google-cloud/tasks');
 const client = new CloudTasksClient({ fallback: true })
```
Setting fallback to true enables a different transport (the one that was initially supposed for browsers) - instead of using gRPC, it serializes your requests and sends them over regular HTTP/1 connection with node-fetch to a different endpoint. When you enable fallback, you don't do any gRPC requests at all - it uses a totally different stack. If you are scheduling many tasks concurrently and if you perform Promise.all with a large number of tasks, you can run into contention on resources, which can lead to issues like DEADLINE_EXCEEDED.

If you have many tasks to enqueue in on job, an approach I take is as follows :

  const tasksToCreate = [...]; // an array with a large number of tasks to enqueue.
  const WORK_SIZE = 32; // some size of work to perform.
  while (work.length > 0) {
  const work = tasksToCreate.slice(0, WORK_SIZE).map(() => {
  return createTaskPromise();
  });
  await Promise.all(work);

  }

Keeping in mind that Firestore has limits, probably “Deadline Exceeded” may also happen because of its limits. Maximum write rate to a document is 1 per second. See this. People performing too many writes too quickly can go through this issue. For now when you hit this, probably retrying the write and perhaps slowing down the rate at which writes are performed may help.

Have a look here at all the other possible causes of this error

Solution :

There are three common ways to perform a large number of write operations on Firestore.

Perform each individual write operation in sequence.
Using batched write operations.
Performing individual write operations in parallel.

The fastest and most efficient way to perform bulk data writes on Firestore is by performing parallel individual write operations. For bulk data entry, use a server client library with parallelized individual writes. You should use a server client library for bulk data operations and not a mobile/web SDK.

Batched writes perform better than serialized writes but not better than parallel writes. Batched writes create a BatchedWrite object by calling batch(), until it has a maximum capacity of 500 documents, and then write it to Firestore. The solution counts every operation which is made to the batch and after the limit is reached a new batch is created and pushed to the batchArray. After all updates are completed the code loops through the batchArray and commits every batch which is inside the array. It is important to count every operation set(), update(), delete() which is made to the batch because they all count to the 500 operation limit.

Have a look at this stackoverflow thread for the detailed analysis on the three write operations.

Thank @Priyashree for your detailed answer. I ended up running each individual write operation in sequence. Some more information: - Current Node version: 14x - Running batch still has the same issue. — Dale Nguyen, Oct 24 '21 at 01:28

Write data to Firestore Concurrently in Bulk

1 Answers1