Using "array-contains" Query for Cloud Firestore Social Media Structure

Question

I have a data structure that consists of a collection, called "Polls." "Polls" has several documents that have randomly generated ID's. Within those documents, there is an additional collection set called "answers." Users vote on these polls, with the votes all written to the "answers" subcollection. I use the .runTransaction() method on the "answers" node, with the idea that this subscollection (for any given poll) is constantly being updated and written to by users.

I have been reading about social media structure for Firestore. However, I recently came across a new feature for Firestore, the "array_contains" query option.

While the post references above discusses a "following" feed for social media structure, I had a different idea in mind. I envision users writing (voting) to my main poll node, therefore creating another "following" node and also having users write to this node to update poll vote counts (using a cloud function) seems horribly inefficient since I would have to constantly be copying from the main node, where votes are being counted.

Would the "array_contains" query be another practical option for social media structure scalability? My thought is:

If user A follows user B, write to a direct array child in my "Users" node called "followers."
Before any poll is created by user B, user's B's device reads "followers" array from Firestore to gain a list of all users following and populates them in the client side, in an Array object
Then, when user B writes a new poll, add that "followers" array to the poll, so each new poll from user B will have an array attached to it that contains all ID's of the users following.

What are the limitations on the "array_contains" query? Is it practical to have an array stored in Firebase that contains thousands of users / followers?

Please tag with relevant technology. While you may be using Android/Java, the code you shared and your question seem purely about JavaScript/Node.js. Also: what is your question? — Frank van Puffelen, Nov 17 '18 at 03:44
Your description is of Cloud Firestore, but your code mixes both Cloud Firestore and the Realtime Database (which are entirely separate services). In general, a one-sided follower model is expensive to model in a DB. You most likely want/need to denormalize the data such that new polls (or their IDs at least) are written to a collection for each follower. — Michael Bleigh, Nov 17 '18 at 05:57
So instead of "Followers," you are thinking have a user node, and then below that have a collection "Following?" — tccpg288, Nov 17 '18 at 14:05

Alex Mamo · Accepted Answer · 2019-05-29T07:30:21.750

Would the "array_contains" query be another practical option for social media structure scalability?

Yes of course. This the reason why Firebase creators added this feature.

Seeing your structure, I think you can give it a try, but to responde to your question.

What are the limitations on the "array_contains" query?

There is no limitations regarding what type of data do you store.

Is it practical to have an array stored in Firebase that contains thousands of users / followers?

Is not about practical or not, is about other type of limitations. The problem is that the documents have limits. So there are some limits when it comes to how much data you can put into a document. According to the official documentation regarding usage and limits:

Maximum size for a document: 1 MiB (1,048,576 bytes)

As you can see, you are limited to 1 MiB total of data in a single document. When we are talking about storing text, you can store pretty much. So in your case, if you would store only ids, I think that will be no problem. But IMHO, as your array getts bigger, be careful about this limitation.

If you are storing large amount of data in arrays and those arrays should be updated by lots of users, there is another limitation that you need to take care of. So you are limited to 1 write per second on every document. So if you have a situation in which a lot of users al all trying to write/update data to the same documents all at once, you might start to see some of this writes to fail. So, be careful about this limitation too.

Really appreciate the prompt response. Do you know if Firestore has commented on these limitations - is it possible they would increase? Also, with 1 MB of space, is it safe to say that consists of 1,000,000 text characters? — tccpg288, Nov 23 '18 at 16:03
I really don't have any idea if they will increase those limitations in the future. You should ask them. So, about 1000 characters equal 1 kilobyte. You can do the maths :) — Alex Mamo, Nov 23 '18 at 16:08
Also, a firestore document has a limitation of 40,000 index fields. This means if you have a document with 40,0010 fields (each and every element in an array and maps are considered as a field for indexing) the last 10 fields will not be indexed, SO you will never be able to query those data. Reference: https://www.youtube.com/watch?v=lW7DWV2jST0 — Soorya, May 06 '20 at 04:25
Adding on more limitation here is that a document can have 20k max lines inside it. It means an array with size of 19999 can be added where as 1 line will be taken the by array name makes it to total 20k. Now above this limit you cannot add any other fields in the document. Solution: In such array cases store ur data inside sub-collection. — Sandeep, May 28 '20 at 07:58

score 1 · Answer 2 · answered Feb 28 '19 at 15:01

I did a real-time polls system, here is my implementation:

I made a polls collection where each document has a unique identifier, a title and an array of answers.

Also, each document has a subcollection called answers where each answer has a title and the total of distributed counters in their own shards subcollection.

Example :

polls/
  [pollID]
    - title: 'Some poll'
    - answers: ['yolo' ...]
    answers/
      [answerID]
        - title: 'yolo'
        - num_shards: 2
        shards/
          [1]
            - count: 2
          [2]
            - count: 16

I made another collection called votes where each document is a composite key of userId_pollId so I can keep tracking if the user has already voted a poll. Each document holds the pollId, the userId, the answerId...

When a document is created, I trigger a Cloud Function that grab the pollId and the answerId and I increment a random shard counter in this answerId's shards subcollection, using a transaction.

Finaly, on the client-side, I reduce the count value of each shards of each answers of a poll to calculate the total.

For the following stuff, you can do the same thing using a middle-man collection called "following", where each document is a composite key of userAid_userBid so you can track easily which user is following another user without breaking firestore's limits.

Using "array-contains" Query for Cloud Firestore Social Media Structure

2 Answers2

Linked