2

we're using Google's Firestore for embedded machine configuration data. Because this data controls a configurable pageflow and lots of other things, it's segmented up into lots of subcollections. Each machine has it's own top level document in this system. However, it takes forever when we go to add machines to the fleet because we have to manually copy over all this data in multiple documents. Does anyone know how to go about recursively copying a Firestore document, all it's subcollections, their documents, subcollections, etc in Python. You'd have a document ref to the top level as well as a name for the new top level doc.

  • Hi there, can you elaborate more on how you are updating all the documents of your firestore? Like how you have structured your data.For example, is your data structured to use lookups? [1]Are you updating all the documents within a single procedure? If so have you tried decoupling it with the help of Cloud Firestore function triggers?[2] With these you can define asynchronous functions that listen on document changes and divide the computing workload. [1] https://www.youtube.com/watch?v=i1n9Kw3AORw&t=438s [2] https://firebase.google.com/docs/functions/firestore-events#function_triggers – Antonio Ramirez May 05 '21 at 21:30

2 Answers2

3

You can use something like this to recursively read and write from a collection to another one:

def read_recursive(
    source: firestore.CollectionReference,
    target: firestore.CollectionReference,
    batch: firestore.WriteBatch,
) -> None:
    global batch_nr

    for source_doc_ref in source:
        document_data = source_doc_ref.get().to_dict()
        target_doc_ref = target.document(source_doc_ref.id)
        if batch_nr == 500:
            log.info("commiting %s batched operations..." % batch_nr)
            batch.commit()
            batch_nr = 0
        batch.set(
            reference=target_doc_ref,
            document_data=document_data,
            merge=False,
        )
        batch_nr += 1
        for source_coll_ref in source_doc_ref.collections():
            target_coll_ref = target_doc_ref.collection(source_coll_ref.id)
            read_recursive(
                source=source_coll_ref.list_documents(),
                target=target_coll_ref,
                batch=batch,
            )

batch = db_client.batch()
read_recursive(
    source=db_client.collection("src_collection_name"), 
    target=db_client.collection("target_collection_name"), 
    batch=batch,
)
batch.commit()

Writes are in batches and this saves a lot of time (in my case it finished in half the time compared with set).

cristi
  • 2,019
  • 1
  • 22
  • 31
2

The questions asks for Python, but in my case I needed to do recursive deep copy of Firestore docs / collections in NodeJS (Typescript), and using a Document as starting point of the recursion.

(This is a solution based on the Python script by @cristi)

Function definition

import {
  CollectionReference,
  DocumentReference,
  DocumentSnapshot,
  QueryDocumentSnapshot,
  WriteBatch,
} from 'firebase-admin/firestore';

interface FirestoreCopyRecursiveContext {
  batchSize: number;
  /**
   * Wrapped Firestore WriteBatch. In firebase-admin@11.0.1, you can't continue
   * using the WriteBatch object after you call WriteBatch.commit().
   * 
   * Hence, we need to replaced "used up" WriteBatch's with new ones.
   * We also need to reset the count after committing, and because we
   * want all recursive invocations to share the same count + WriteBatch instance,
   * we pass this data via object reference.
   */
  writeBatch: {
    writeBatch: WriteBatch,
    /** Num of items in current batch. Reset to 0 when `commitBatch` commits.  */
    count: number;
  };
  /**
   * Function that commits the batch if it reached the limit or is forced to.
   * The WriteBatch instance is automatically replaced with fresh one
   * if commit did happen.
   */
  commitBatch: (force?: boolean) => Promise<void>;
  /** Callback to insert custom logic / write operations when we encounter a document */
  onDocument?: (
    sourceDoc: QueryDocumentSnapshot | DocumentSnapshot,
    targetDocRef: DocumentReference,
    context: FirestoreCopyRecursiveContext
  ) => unknown;
  /** Callback to insert custom logic / write operations when we encounter a collection */
  onCollection?: (
    sourceDoc: CollectionReference,
    targetDocRef: CollectionReference,
    context: FirestoreCopyRecursiveContext
  ) => unknown;
  logger?: Console['info'];
}

type FirestoreCopyRecursiveOptions = Partial<Omit<FirestoreCopyRecursiveContext, 'commitBatch'>>;

/**
 * Copy all data from one document to another, including
 * all subcollections and documents within them, etc.
 */
export const firestoreCopyDocRecursive = async (
  /** Source Firestore Document Snapshot, descendants of which we want to copy */
  sourceDoc: QueryDocumentSnapshot | DocumentSnapshot,
  /** Destination Firestore Document Ref */
  targetDocRef: DocumentReference,
  options?: FirestoreCopyRecursiveOptions,
) => {
  const batchSize = options?.batchSize ?? 500;
  const writeBatchRef = options?.writeBatch || { writeBatch: firebaseFirestore.batch(), count: 0 };
  const onDocument = options?.onDocument;
  const onCollection = options?.onCollection;
  const logger = options?.logger || console.info;

  const commitBatch = async (force?: boolean) => {
    // Commit batch only if size limit hit or forced
    if (writeBatchRef.count < batchSize && !force) return;

    logger(`Commiting ${writeBatchRef.count} batched operations...`);
    await writeBatchRef.writeBatch.commit();
    // Once we commit the batched data, we have to create another WriteBatch,
    // otherwise we get error:
    // "Cannot modify a WriteBatch that has been committed."
    // See https://dev.to/wceolin/cannot-modify-a-writebatch-that-has-been-committed-265f
    writeBatchRef.writeBatch = firebaseFirestore.batch();
    writeBatchRef.count = 0;
  };

  const context = {
    batchSize,
    writeBatch: writeBatchRef,
    onDocument,
    onCollection,
    commitBatch,
  };

  // Copy the contents of the current docs
  const sourceDocData = sourceDoc.data();
  await writeBatchRef.writeBatch.set(targetDocRef, sourceDocData, { merge: false });
  writeBatchRef.count += 1;
  await commitBatch();

  // Allow to make additional changes to the target document from
  // outside the func after copy command is enqueued / commited.
  await onDocument?.(sourceDoc, targetDocRef, context);
  // And try to commit in case user updated the count but forgot to commit
  await commitBatch();

  // Check for subcollections and docs within them
  for (const sourceSubcoll of await sourceDoc.ref.listCollections()) {
    const targetSubcoll = targetDocRef.collection(sourceSubcoll.id);

    // Allow to make additional changes to the target collection from
    // outside the func after copy command is enqueued / commited.
    await onCollection?.(sourceSubcoll, targetSubcoll, context);
    // And try to commit in case user updated the count but forgot to commit
    await commitBatch();

    for (const sourceSubcollDoc of (await sourceSubcoll.get()).docs) {
      const targetSubcollDocRef = targetSubcoll.doc(sourceSubcollDoc.id);
      await firestoreCopyDocRecursive(sourceSubcollDoc, targetSubcollDocRef, context);
    }
  }

  // Commit all remaining operations
  return commitBatch(true);
};

How to use it

const sourceDocRef = getYourFaveFirestoreDocRef(x);
const sourceDoc = await sourceDocRef.get();
const targetDocRef = getYourFaveFirestoreDocRef(y);

// Copy firestore resources
await firestoreCopyDocRecursive(sourceDoc, targetDocRef, {
  logger,
  // Note: In my case some docs had their doc ID also copied as a field.
  //       Because the copied documents get a new doc ID, we need to update
  //       those fields too.
  onDocument: async (sourceDoc, targetDocRef, context) => {
    const someDocPattern = /^nameOfCollection\/[^/]+?$/;
    const subcollDocPattern = /^nameOfCollection\/[^/]+?\/nameOfSubcoll\/[^/]+?$/;

    // Update the field that holds the document ID
    if (targetDocRef.path.match(someDocPattern)) {
      const docId = targetDocRef.id;
      context.writeBatch.writeBatch.set(targetDocRef, { docId }, { merge: true });
      context.writeBatch.count += 1;
      await context.commitBatch();
      return;
    }

    // In a subcollection, I had to update multiple ID fields
    if (targetDocRef.path.match(subcollDocPattern)) {
      const docId = targetDocRef.parent.parent?.id;
      const subcolDocId = targetDocRef.id;
      context.writeBatch.writeBatch.set(targetDocRef, { docId, subcolDocId }, { merge: true });
      context.writeBatch.count += 1;
      await context.commitBatch();
      return;
    }
  },
});
JuroOravec
  • 326
  • 2
  • 9