5

We are planning to implement a virtual filesystem using Google Firestore.

The idea of subcollections is nice because it allows us to model our data in terms of a folder hierarchy, like so: /folders/folderA/entities/folderB/entities/fileX

Much like an actual filesystem, we'd like to handle cross-folder moves, such as moving nested subfolder folderB from parent folderA to parent folderC. Indeed, it will often be the case that the folder we want to move may themselves contain their own subcollections of files and folders an arbitrary K levels deep.

This comment suggests that moving a document will not automagically move its associated subcollections. Similarly, deleting a document will forego deleting its underlying subcollections, leaving them as orphans. It seems like the only way to move a folder (and its entities) from one parent to another would be through a recursive clone + delete strategy, which may be difficult to accomplish reliably and transactionally if its sub-entities are massive.

The alternative is to abandon using subcollections and store all folders at the root instead, using a document field like parent_id to point to other docs within the flat collection. This shouldn't impact querying speeds due to Firestore's aggressive indexing, but we've been unable to reproduce this claim locally; i.e., querying via subcollections is vastly more performant as the total # of documents increase in the DB, versus storing everything at the top level. A reproducible repo is available here. Note that the repo uses a local emulator instance, as opposed to an actual Firestore server.

Any advice would be very helpful!

robinnnnn
  • 1,675
  • 4
  • 17
  • 32
  • 1
    I wouldn't interpret the performance of the local emulator to be anything like the cloud hosted service. The local emulator runs on a single machine, and the actual service is massively scalable on many machines in Google cloud infrastructure. – Doug Stevenson Apr 22 '19 at 21:48
  • Thanks @DougStevenson! We will use the real cloud hosted service to continue our experiment / POC and see if there is a real difference in query times – robinnnnn Apr 22 '19 at 22:17
  • @robinnnnn can you share how your server-side performance tests worked out? – nhe Aug 23 '20 at 18:27

0 Answers0