33

I'm trying to update a field timestamp with the Firestore admin timestamp in a collection with more than 500 docs.

const batch = db.batch();
const serverTimestamp = admin.firestore.FieldValue.serverTimestamp();

db
  .collection('My Collection')
  .get()
  .then((docs) => {
    serverTimestamp,
  }, {
    merge: true,
  })
  .then(() => res.send('All docs updated'))
  .catch(console.error);

This throws an error

{ Error: 3 INVALID_ARGUMENT: cannot write more than 500 entities in a single call
    at Object.exports.createStatusError (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\common.js:87:15)
    at Object.onReceiveStatus (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\client_interceptors.js:1188:28)
    at InterceptingListener._callNext (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\client_interceptors.js:564:42)
    at InterceptingListener.onReceiveStatus (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\client_interceptors.js:614:8)
    at callback (C:\Users\Growthfile\Desktop\cf-test\functions\node_modules\grpc\src\client_interceptors.js:841:24)
  code: 3,
  metadata: Metadata { _internal_repr: {} },
  details: 'cannot write more than 500 entities in a single call' }

Is there a way that I can write a recursive method which creates a batch object updating a batch of 500 docs one by one until all the docs are updated.

From the docs I know that delete operation is possible with the recursive approach as mentioned here:

https://firebase.google.com/docs/firestore/manage-data/delete-data#collections

But, for updating, I'm not sure how to end the execution since the docs are not being deleted.

benomatis
  • 5,536
  • 7
  • 36
  • 59
Utkarsh Bhatt
  • 1,536
  • 2
  • 14
  • 23
  • 1
    Why dont you iterate through all the 500 docs, update and and use the last doc key to construct startAt to create a new query? – Borko Kovacev Sep 04 '18 at 11:40
  • You can limit and then batch recursively, faced same issue and this was my solution: https://stackoverflow.com/a/61639536/2195000 – Stathis Ntonas May 07 '20 at 07:14

9 Answers9

69

I also ran into the problem to update more than 500 documents inside a Firestore collection. And i would like to share how i solved this problem.

I use cloud functions to update my collection inside Firestore but this should also work on client side code.

The solution counts every operation which is made to the batch and after the limit is reached a new batch is created and pushed to the batchArray.

After all updates are completed the code loops through the batchArray and commits every batch which is inside the array.

It is important to count every operation set(), update(), delete() which is made to the batch because they all count to the 500 operation limit.

const documentSnapshotArray = await firestore.collection('my-collection').get();

const batchArray = [];
batchArray.push(firestore.batch());
let operationCounter = 0;
let batchIndex = 0;

documentSnapshotArray.forEach(documentSnapshot => {
    const documentData = documentSnapshot.data();

    // update document data here...

    batchArray[batchIndex].update(documentSnapshot.ref, documentData);
    operationCounter++;

    if (operationCounter === 499) {
      batchArray.push(firestore.batch());
      batchIndex++;
      operationCounter = 0;
    }
});

batchArray.forEach(async batch => await batch.commit());

return;
Sebastian Vischer
  • 1,310
  • 10
  • 9
  • 9
    how do u ensure that all the batches are executed successfully as only the operations within a batch are atomic. It would lead to data inconsistency if some batches executed and some didn't – Adarsh Jul 29 '19 at 10:17
  • @Adarsh Yes, you are right. I have left out the error handling part. I will add this part to the answer soon. I have updated my database to a new data model which was an idempotent operation in my case. So i could repeat the code until every batch succeeds. – Sebastian Vischer Jul 29 '19 at 12:56
  • So there are couple of things which you can do. You can check the retry option when creating the cloud function. This will make sure your cloud function executes on any exception. But you will have to handle which failure you consider as `transient` else it will turn out to be an endless loop. Also some kind of state has to be maintained between cloud function executions so that the batches executed earlier aren't executed again. Maybe you can write to realtime database/firestore on every successful batch operation and carry on from there when some batch didn't in the next retry – Adarsh Jul 29 '19 at 13:03
  • Or you could write the job details (update details) to let's say `/queue/pendingUpdates/` and write a cloud function which runs on a schedule (say every 5mins) which performs the updates. Once the operation is successful, you can delete/mark the job as completed. Else it retries automatically in the next interval. This is lot easier than the first one. Your thoughts? – Adarsh Jul 29 '19 at 13:13
  • I do not know your use case. Do you often write more than 500 documents? – Sebastian Vischer Jul 29 '19 at 13:58
  • consider this scenerio: User details are denormalized into audit trail collection. When a user makes any changes, an entry is made to audit trail. When the user updates their profile photo, username, phone number or email, it has to be updated in all documents having the denormalized user data which eventually can exceed 500 documents count – Adarsh Jul 29 '19 at 14:06
  • I do not know your use case. Do you often write more than 500 documents? Maybe you could structure your data differently? Your solution with the state written to the database is ok but these writes could also fail and mess up your data. I would consider a solution with an query for not updated documents then as soon as a document is updated it is no longer in the query. You could repeat this until the query is empty. But this depends on your use case. If you know how the updated data should look like you could also use transactions. – Sebastian Vischer Jul 29 '19 at 14:10
  • I prefer not to denormalize data in a noSQL database. I only have one or a few documents per user and all other users get the data from these few documents. This way you can scale your app properly if you have a lot of users. With denormalized data your app will be very inefficient. – Sebastian Vischer Jul 29 '19 at 14:17
  • The reason data is denormalized is because the number of times the reads happen > number of times writes happen. In your case, you will have to fetch the user details again (2 reads instead of 1 per user). NoSQL encourages denormalization of data as well. Is there any reason you haven't denormalized the data? what happens when your user base grows or users start sharing the same document etc? – Adarsh Jul 29 '19 at 15:08
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/197188/discussion-between-sebe-and-adarsh). – Sebastian Vischer Jul 30 '19 at 00:29
  • @Sebe have you tested this for real life scenario? Does this creates new batch object whenever the batch write reach 500? Thanks – Mihae Kheel Nov 17 '20 at 09:50
  • 1
    @Mihae Kheel Yes, the loop creates a new batch after it reaches 500 operations, but it is important to count every operation. Also you need some form of error handling. – Sebastian Vischer Dec 16 '20 at 22:00
  • @SebastianVischer it seems the code and logic works fine when I used it. Thank you very much – Mihae Kheel Dec 18 '20 at 05:07
  • The answer is great but I had 29000 documents in a collection to be updated and this failed. Unfortunately, as there is no exception handling, I was getting the errors after 2-3 minutes and was difficult to find. Error with code 16 something. So I tweaked the logic a bit to have some gap between the batch commits (guess that worked.) Will try to add that as another answer. Thanks. – saurabh Mar 19 '21 at 14:46
  • @saurabh I have never tried with this many documents. Maybe there is some kind of limit for commits. I like your solution with committing the batch after you reach 500 operation. In my opinion the simpler solution. – Sebastian Vischer Mar 20 '21 at 00:41
  • @Adarsh what about `Promise.all(batchArray.map(batch => batch.commit()).then().catch();` ? – optimista Apr 03 '21 at 21:58
  • The code can return before the commits complete. The batchArray.forEach() line could be: `await Promise.all(batchArray.map(batch => batch.commit()));` – jscuba Apr 10 '22 at 03:25
  • @jscuba wouldn't the inner callback method also need "async/await" keywords, such as await Promise.all(batchArray.map(async (batch) => { await batch.commit(); })); – Cedric Jun 11 '22 at 20:44
  • this answer explains it in greater detail https://stackoverflow.com/questions/37576685/using-async-await-with-a-foreach-loop#answer-37576787 – Cedric Jun 11 '22 at 20:44
  • 1
    @Cedric No, batch.commit() returns a promise and Promise.all() waits on the array of promises. Adding async/await to the inner callback like you did would have Promise.all() wait on an array of undefined values. Either way though, all the batch.commit() calls resolve. So, in this case, either one might work, but the original is correct. – jscuba Jun 13 '22 at 04:22
  • @jscuba oh thanks, that makes sense. But in the url that I provided, the example does use await inside the callback function, for a value which will be used in the next line. This callback function just automatically returns Promise, right? – Cedric Jun 13 '22 at 09:52
  • 1
    @Cedric Ah, you're right. Adding async/await to the inner callback would just add another layer of promises. Thanks for bringing this up. It had me understand it better. – jscuba Jun 13 '22 at 12:26
  • Utility method for TypeScript: https://gist.github.com/wcoder/9bb44ffe709397f657864f6a404cf7cc – Yauheni Pakala Jul 31 '22 at 14:19
28

I liked this simple solution:

const users = await db.collection('users').get()

const batches = _.chunk(users.docs, 500).map(userDocs => {
    const batch = db.batch()
    userDocs.forEach(doc => {
        batch.set(doc.ref, { field: 'myNewValue' }, { merge: true })
    })
    return batch.commit()
})

await Promise.all(batches)

Just remember to add import * as _ from "lodash" at the top. Based on this answer.

ernewston
  • 923
  • 6
  • 22
8

You can use default BulkWriter. This method used 500/50/5 rule.

Example:

let bulkWriter = firestore.bulkWriter();

bulkWriter.create(documentRef, {foo: 'bar'});
bulkWriter.update(documentRef2, {foo: 'bar'});
bulkWriter.delete(documentRef3);
await close().then(() => {
  console.log('Executed all writes');
});
mixalbl4
  • 3,507
  • 1
  • 30
  • 44
4

As mentioned above, @Sebastian's answer is good and I upvoted that too. Although faced an issue while updating 25000+ documents in one go. The tweak to logic is as below.

console.log(`Updating documents...`);
let collectionRef = db.collection('cities');
try {
  let batch = db.batch();
  const documentSnapshotArray = await collectionRef.get();
  const records = documentSnapshotArray.docs;
  const index = documentSnapshotArray.size;
  console.log(`TOTAL SIZE=====${index}`);
  for (let i=0; i < index; i++) {
    const docRef = records[i].ref;
    // YOUR UPDATES
    batch.update(docRef, {isDeleted: false});
    if ((i + 1) % 499 === 0) {
      await batch.commit();
      batch = db.batch();
    }
  }
  // For committing final batch
  if (!(index % 499) == 0) {
    await batch.commit();
  }
  console.log('write completed');
} catch (error) {
  console.error(`updateWorkers() errored out : ${error.stack}`);
  reject(error);
}
Dharman
  • 30,962
  • 25
  • 85
  • 135
saurabh
  • 729
  • 1
  • 7
  • 16
1

Explanations given on previous comments already explain the issue.

I'm sharing the final code that I built and worked for me, since I needed something that worked in a more decoupled manner, instead of the way that most of the solutions presented above do.

import { FireDb } from "@services/firebase"; // = firebase.firestore();

type TDocRef = FirebaseFirestore.DocumentReference;
type TDocData = FirebaseFirestore.DocumentData;

let fireBatches = [FireDb.batch()];
let batchSizes = [0];
let batchIdxToUse = 0;

export default class FirebaseUtil {
  static addBatchOperation(
    operation: "create",
    ref: TDocRef,
    data: TDocData
  ): void;
  static addBatchOperation(
    operation: "update",
    ref: TDocRef,
    data: TDocData,
    precondition?: FirebaseFirestore.Precondition
  ): void;
  static addBatchOperation(
    operation: "set",
    ref: TDocRef,
    data: TDocData,
    setOpts?: FirebaseFirestore.SetOptions
  ): void;
  static addBatchOperation(
    operation: "create" | "update" | "set",
    ref: TDocRef,
    data: TDocData,
    opts?: FirebaseFirestore.Precondition | FirebaseFirestore.SetOptions
  ): void {
    // Lines below make sure we stay below the limit of 500 writes per
    // batch
    if (batchSizes[batchIdxToUse] === 500) {
      fireBatches.push(FireDb.batch());
      batchSizes.push(0);
      batchIdxToUse++;
    }
    batchSizes[batchIdxToUse]++;

    const batchArgs: [TDocRef, TDocData] = [ref, data];
    if (opts) batchArgs.push(opts);

    switch (operation) {
      // Specific case for "set" is required because of some weird TS
      // glitch that doesn't allow me to use the arg "operation" to
      // call the function
      case "set":
        fireBatches[batchIdxToUse].set(...batchArgs);
        break;
      default:
        fireBatches[batchIdxToUse][operation](...batchArgs);
        break;
    }
  }

  public static async runBatchOperations() {
    // The lines below clear the globally available batches so we
    // don't run them twice if we call this function more than once
    const currentBatches = [...fireBatches];
    fireBatches = [FireDb.batch()];
    batchSizes = [0];
    batchIdxToUse = 0;

    await Promise.all(currentBatches.map((batch) => batch.commit()));
  }
}

Jean Costa
  • 800
  • 7
  • 10
1

Based on all the above answers, I put together the following pieces of code that one can put into a module in JavaScript back-end and front-end to easily use Firestore batch writes, without worrying about the 500 writes limit.

Back-end (Node.js)

// The Firebase Admin SDK to access Firestore.
const admin = require("firebase-admin");
admin.initializeApp();

// Firestore does not accept more than 500 writes in a transaction or batch write.
const MAX_TRANSACTION_WRITES = 499;

const isFirestoreDeadlineError = (err) => {
  console.log({ err });
  const errString = err.toString();
  return (
    errString.includes("Error: 13 INTERNAL: Received RST_STREAM") ||
    errString.includes("Error: 4 DEADLINE_EXCEEDED: Deadline exceeded")
  );
};

const db = admin.firestore();

// How many transactions/batchWrites out of 500 so far.
// I wrote the following functions to easily use batchWrites wthout worrying about the 500 limit.
let writeCounts = 0;
let batchIndex = 0;
let batchArray = [db.batch()];

// Commit and reset batchWrites and the counter.
const makeCommitBatch = async () => {
  console.log("makeCommitBatch");
  await Promise.all(batchArray.map((bch) => bch.commit()));
};

// Commit the batchWrite; if you got a Firestore Deadline Error try again every 4 seconds until it gets resolved.
const commitBatch = async () => {
  try {
    await makeCommitBatch();
  } catch (err) {
    console.log({ err });
    if (isFirestoreDeadlineError(err)) {
      const theInterval = setInterval(async () => {
        try {
          await makeCommitBatch();
          clearInterval(theInterval);
        } catch (err) {
          console.log({ err });
          if (!isFirestoreDeadlineError(err)) {
            clearInterval(theInterval);
            throw err;
          }
        }
      }, 4000);
    }
  }
};

//  If the batchWrite exeeds 499 possible writes, commit and rest the batch object and the counter.
const checkRestartBatchWriteCounts = () => {
  writeCounts += 1;
  if (writeCounts >= MAX_TRANSACTION_WRITES) {
    batchIndex++;
    batchArray.push(db.batch());
    writeCounts = 0;
  }
};

const batchSet = (docRef, docData) => {
  batchArray[batchIndex].set(docRef, docData);
  checkRestartBatchWriteCounts();
};

const batchUpdate = (docRef, docData) => {
  batchArray[batchIndex].update(docRef, docData);
  checkRestartBatchWriteCounts();
};

const batchDelete = (docRef) => {
  batchArray[batchIndex].delete(docRef);
  checkRestartBatchWriteCounts();
};

module.exports = {
  admin,
  db,
  MAX_TRANSACTION_WRITES,
  checkRestartBatchWriteCounts,
  commitBatch,
  isFirestoreDeadlineError,
  batchSet,
  batchUpdate,
  batchDelete,
};

Front-end

// Firestore does not accept more than 500 writes in a transaction or batch write.
const MAX_TRANSACTION_WRITES = 499;

const isFirestoreDeadlineError = (err) => {
  return (
    err.message.includes("DEADLINE_EXCEEDED") ||
    err.message.includes("Received RST_STREAM")
  );
};

class Firebase {
  constructor(fireConfig, instanceName) {
    let app = fbApp;
    if (instanceName) {
      app = app.initializeApp(fireConfig, instanceName);
    } else {
      app.initializeApp(fireConfig);
    }
    this.name = app.name;
    this.db = app.firestore();
    this.firestore = app.firestore;
    // How many transactions/batchWrites out of 500 so far.
    // I wrote the following functions to easily use batchWrites wthout worrying about the 500 limit.
    this.writeCounts = 0;
    this.batch = this.db.batch();
    this.isCommitting = false;
  }

  async makeCommitBatch() {
    console.log("makeCommitBatch");
    if (!this.isCommitting) {
      this.isCommitting = true;
      await this.batch.commit();
      this.writeCounts = 0;
      this.batch = this.db.batch();
      this.isCommitting = false;
    } else {
      const batchWaitInterval = setInterval(async () => {
        if (!this.isCommitting) {
          this.isCommitting = true;
          await this.batch.commit();
          this.writeCounts = 0;
          this.batch = this.db.batch();
          this.isCommitting = false;
          clearInterval(batchWaitInterval);
        }
      }, 400);
    }
  }

  async commitBatch() {
    try {
      await this.makeCommitBatch();
    } catch (err) {
      console.log({ err });
      if (isFirestoreDeadlineError(err)) {
        const theInterval = setInterval(async () => {
          try {
            await this.makeCommitBatch();
            clearInterval(theInterval);
          } catch (err) {
            console.log({ err });
            if (!isFirestoreDeadlineError(err)) {
              clearInterval(theInterval);
              throw err;
            }
          }
        }, 4000);
      }
    }
  }

  async checkRestartBatchWriteCounts() {
    this.writeCounts += 1;
    if (this.writeCounts >= MAX_TRANSACTION_WRITES) {
      await this.commitBatch();
    }
  }

  async batchSet(docRef, docData) {
    if (!this.isCommitting) {
      this.batch.set(docRef, docData);
      await this.checkRestartBatchWriteCounts();
    } else {
      const batchWaitInterval = setInterval(async () => {
        if (!this.isCommitting) {
          this.batch.set(docRef, docData);
          await this.checkRestartBatchWriteCounts();
          clearInterval(batchWaitInterval);
        }
      }, 400);
    }
  }

  async batchUpdate(docRef, docData) {
    if (!this.isCommitting) {
      this.batch.update(docRef, docData);
      await this.checkRestartBatchWriteCounts();
    } else {
      const batchWaitInterval = setInterval(async () => {
        if (!this.isCommitting) {
          this.batch.update(docRef, docData);
          await this.checkRestartBatchWriteCounts();
          clearInterval(batchWaitInterval);
        }
      }, 400);
    }
  }

  async batchDelete(docRef) {
    if (!this.isCommitting) {
      this.batch.delete(docRef);
      await this.checkRestartBatchWriteCounts();
    } else {
      const batchWaitInterval = setInterval(async () => {
        if (!this.isCommitting) {
          this.batch.delete(docRef);
          await this.checkRestartBatchWriteCounts();
          clearInterval(batchWaitInterval);
        }
      }, 400);
    }
  }
}
1man
  • 5,216
  • 7
  • 42
  • 56
1

No citations or documentation, this code i invented by myself and for me it worked and looks clean, and simple for read and usage. If some one like it, then can use it too.

Better make autotest becose code use private var _ops wich can be changed after packages upgrade. Forexample in old versions its can be _mutations

async function commitBatch(batch) {
  const MAX_OPERATIONS_PER_COMMIT = 500;

  while (batch._ops.length > MAX_OPERATIONS_PER_COMMIT) {
    const batchPart = admin.firestore().batch();

    batchPart._ops = batch._ops.splice(0, MAX_OPERATIONS_PER_COMMIT - 1);

    await batchPart.commit();
  }

  await batch.commit();
}

Usage:

const batch = admin.firestore().batch();

batch.delete(someRef);
batch.update(someRef);

...

await commitBatch(batch);
  • Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](https://stackoverflow.com/help/how-to-answer). – Vimal Patel Nov 13 '22 at 17:00
0

Simple solution Just fire twice ? my array is "resultsFinal" I fire batch once with a limit of 490 , and second with a limit of the lenght of the array ( results.lenght) Works fine for me :) How you check it ? You go to firebase and delete your collection , firebase say you have delete XXX docs , same as the lenght of your array ? Ok so you are good to go

async function quickstart(results) {
    // we get results in parameter for get the data inside quickstart function
    const resultsFinal = results;
    // console.log(resultsFinal.length);
    let batch = firestore.batch();
    // limit of firebase is 500 requests per transaction/batch/send 
    for (i = 0; i < 490; i++) {
        const doc = firestore.collection('testMore490').doc();
        const object = resultsFinal[i];
        batch.set(doc, object);
    }
    await batch.commit();
    // const batchTwo = firestore.batch();
    batch = firestore.batch();

    for (i = 491; i < 776; i++) {
        const objectPartTwo = resultsFinal[i];
        const doc = firestore.collection('testMore490').doc();
        batch.set(doc, objectPartTwo);
    }
    await batch.commit();

}
sylvain s
  • 324
  • 4
  • 11
0

I like this implementation: https://github.com/qualdesk/firestore-big-batch

Here's a blog post about it (not mine): https://www.qualdesk.com/blog/2021/the-solution-to-firestore-batched-write-limit/

It's a drop-in replacement for Firestore's batch. Instead of this:

const batch = db.batch();

...do this:

const batch = new BigBatch({ db });

Here's my variation of it, which is updated to be type compatible with the latest firebase-admin and TypeScript. I also added a setGroup option, which ensures that a group of operations are part of the same batch.

// Inspired by: https://github.com/qualdesk/firestore-big-batch

import type {
  DocumentReference,
  Firestore,
  SetOptions,
  WriteBatch,
} from 'firebase-admin/firestore';

const MAX_OPERATIONS_PER_FIRESTORE_BATCH = 499;

export class BigBatch {
  private db: Firestore;
  private currentBatch: WriteBatch;
  private batchArray: Array<WriteBatch>;
  private operationCounter: number;

  constructor({ db }: { db: Firestore }) {
    this.db = db;
    this.currentBatch = db.batch();
    this.batchArray = [this.currentBatch];
    this.operationCounter = 0;
  }

  private startNewBatch() {
    this.currentBatch = this.db.batch();
    this.batchArray.push(this.currentBatch);
    this.operationCounter = 0;
  }

  private checkLimit() {
    if (this.operationCounter < MAX_OPERATIONS_PER_FIRESTORE_BATCH)
      return;

    this.startNewBatch();
  }

  private ensureGroupOperation(operations: unknown[]) {
    if (operations.length > MAX_OPERATIONS_PER_FIRESTORE_BATCH)
      throw new Error(
        `Group can only accept ${MAX_OPERATIONS_PER_FIRESTORE_BATCH} operations.`,
      );

    if (
      this.operationCounter + operations.length >
      MAX_OPERATIONS_PER_FIRESTORE_BATCH
    )
      this.startNewBatch();
  }

  /**
   * Add a single set operation to the batch.
   */
  set(
    ref: DocumentReference,
    data: object,
    options: SetOptions = {},
  ) {
    this.currentBatch.set(ref, data, options);
    this.operationCounter++;
    this.checkLimit();
  }

  /**
   * Add a group of set operations to the batch. This method ensures that everything in a group will be included in the same batch.
   * @param group Array of objects with ref, data, and options
   */
  setGroup(
operations: {
  ref: DocumentReference;
  data: object;
  options?: SetOptions;
}[],
  ) {
    this.ensureGroupOperation(operations);
    operations.forEach(o =>
      this.currentBatch.set(o.ref, o.data, o.options ?? {}),
    );
    this.operationCounter += operations.length;
    this.checkLimit();
  }

  update(ref: DocumentReference, data: object) {
    this.currentBatch.update(ref, data);
    this.operationCounter++;
    this.checkLimit();
  }

  delete(ref: DocumentReference) {
    this.currentBatch.delete(ref);
    this.operationCounter++;
    this.checkLimit();
  }

  commit() {
    const promises = this.batchArray.map(batch => batch.commit());
    return Promise.all(promises);
  }
}
Johnny Oshika
  • 54,741
  • 40
  • 181
  • 275