2

I have a a collection named "users" within which sits a list of documents, each corresponding to a particular user. Within each of those user documents is a subcollection named "events" within which is a list of documents that have been created by parsing and processing several json files that the user uploads.

Schema: Users (collection) -> user(x) (document) -> Events (collection) -> event object documents.

An example json file (realistically, the files are much longer than the below example):

{
    "events": [
        {
        "name": "name1",
        "start": 1584165600,
        "end": 1584180000
        },
        {
        "name": "name2",
        "start": 1583956800,
        "end": 1583978400
        },
        {
        "name": "name3",
        "start": 1583550000,
        "end": 1583978400
        },
        {
        "name": "name4",
        "start": 1578549600,
        "end": 1583978400
        }
    ]
}

So, in this case, there will be four documents within the "events" collection (each sub-object shown above).

I'm currently creating the above documents by using the add() command for firestore.

The problem I'm facing is that I'm not sure how to implement the above so that every time the user uploads a new version of the file, the collection is overwritten with the new documents (so that there are no duplicates, which would otherwise happen with add()).

A few notes:

  1. Because I'm using add(), the document IDs are random.
  2. I can't use the name of each object as an identifier because some json documents that are uploaded have different structures.
  3. set() allows me to merge, yes, but I don't know how to merge documents without knowing their IDs.
  4. There is the option of deleting the subcollection every single time a file is uploaded, but that just sounds like a lot of unnecessary work (and a lot of unnecessary processing)?

I'm open to suggestions of changing my implementation if there are any alternatives to using add() as well in order to avoid the duplication issue :)

Edit: The above implementation will run from within a Cloud Function.

firearian
  • 65
  • 1
  • 7
  • The collection is overwritten with the new documents - so you want to delete the old documents which are not there in the new file? I think you can use the hash of the document as the id to know if it exists already. – Akshay Jain Apr 24 '20 at 06:47
  • @AkshayJain yes. Basically I'm worried that when the user uploads a new file, the current implementation will just duplicate all of the entries within that file in the "events" collection. I want to avoid doing this; so I thought overwriting it would work? I'm not sure how to get the hash of a firestore document; is there a link you could provide to help me look into it? :) – firearian Apr 24 '20 at 06:55
  • See [this answer](https://stackoverflow.com/questions/5878682/node-js-hash-string) on how to calculate hash in nodejs. See the answer for simple string. Pass the json content as the string and get the hash. If you are using the hash, please ensure you always use the exact same string to calculate the hash, otherwise, the value will mismatch. – Akshay Jain Apr 24 '20 at 07:18
  • And for deletion, you can use something similar. Calculate and store the hash of all the documents from the file into an array. Now list the ids of all the documents in your subcollection, and foreach id check if it is present in the list of hash values, if not delete it. – Akshay Jain Apr 24 '20 at 07:23
  • Why don't you just delete the events collection before the user uploads a new file? – Jay Apr 24 '20 at 17:08
  • Thanks @AkshayJain for your solution! I've followed it and so far there doesn't seem to be any issues coming up in terms of duplication and that sort of thing :) I'm still getting an error, but that seems to be unrelated to this question. – firearian May 04 '20 at 13:55
  • @Jay Unfortunately, if I were to delete the events collection every time that the user uploads a new file, I would have to iterate through literally every single document within that collection and delete all of them. This would essentially double the time it would take to complete that write. – firearian May 04 '20 at 13:56
  • No, you don't have to to do that at all. Just craft a callable cloud function and let it handle the delete for you. There's a pretty simple example in the docs [Delete Collections](https://firebase.google.com/docs/firestore/solutions/delete-collections#cloud_function) – Jay May 04 '20 at 18:12
  • @Jay isn't that just a recursive delete as well? It says "Run a recursive delete on the given document or collection path" – firearian May 07 '20 at 08:29
  • Yes, but it's performed on and by the *server* not your app - your concern was *This would essentially double the time it would take to complete that write.* and if it's done on the server level it's in the blink of an eye because you don't have to iterate at the app level. – Jay May 07 '20 at 13:01
  • @Jay I'm so very sorry, I forgot to mention that the code that I was referencing above will actually be in a Cloud Function as well. Unfortunately, even without deleting the files, at the moment writing all the data takes a long time, so I assumed that deleting it as well would add a huge overhead to the task? >.. – firearian May 07 '20 at 13:06
  • No. It really won't add overhead to the task as it's being running server side, not on a local device. – Jay May 07 '20 at 15:17

0 Answers0