5

This is the current sample structure

Posts(Collection)
    - post1Id : {
          viewCount : 100,
          likes     : 45,
          points    : 190,
          title     : "Title",
          postType  : image/video
          url       : FileUrl,
          createdOn : Timestamp,
          createdBy : user20Id,
          userName  : name,
          profilePic: url
      }
Users(Collection)
    - user1Id(Document):{
          postsCount : 10,
          userName  : name,
          profilePic : url
      }
        viewed(Collection)
            - post1Id(Document):{
                  viewedTime : ""
              }
                 
    - user2Id(Document)

The End goal is

  • I need to getPosts that the current user did not view and in points field descending order with paging.

What are the possible optimal solutions(like changing structure, cloud functions, multiple queries from client-side)?

Alex Mamo
  • 130,605
  • 17
  • 163
  • 193
Pavan Varma
  • 1,199
  • 1
  • 9
  • 21

2 Answers2

7

I'm working on a solution to show trending posts and eliminate posts that are already seen by users or poor content. It's really painful to deal with two queries especially when the user base is increasing. It's difficult to maintain the "viewed" collection and filter the new posts. Imagine having 1 million viewed posts and then filter for the un-seen posts.

So I figured a solution, which is not that great, but still cool.

So here is our data structure

posts(Collection) --postid(document)

  1. Title.
  2. Description.
  3. Image.
  4. timestamp.
  5. priority

This is a simple post structure with basic details. You can see I have added a Priority field. This field will do the magic.

How to use Priority.

  1. We should query the posts that start with the higher priority and ends with lower priority.
  2. When a user posts a new Post. Assign the current timestamp as the default priority.
  3. When the user upvotes (Likes) a post increase the priority by 1 minute(60000 milliseconds)
  4. When the user downvotes (Dislike) a post decrease the priority by 1 minute (60000 ms)
  5. You can reset the priority every 24 hours. If you start browsing the feed today morning you will see posts with the last 24 hours in past. Once the 24-hour duration reached you can reset the priority to the present time. The 24-hour limit can be changed according to your needs. You may want to reset the limit every 15 min. because in every 15 min 100s of new posts might have added. This limit will ensure the repetition of content in the feed.

So when you start scrolling the feed you will get all the trending posts first then lower priority posts later. If you post a post today and people start upvoting it. It will get an increased lifetime, thus overpowers the poor content and when you downvote it, it will push down the post as long as users will not reach it.

Using timestamp as a priority because the old posts should lose priority with time. Even the trending posts today should lose the priority tomorrow.

Things to consider:

The lifetime can vary according to your needs. The bigger the user base. You should lower the lifetime value. because if a post posted today is upvoted by 10,000 users it trends 6.9 days in the future. And if there are more than 100 posts that have been upvoted by more than 10,000 users then you will never get to see a new post in those 6.9 days. So a trending post should hardly last a day or two.

So in this case you can give 10 seconds lifetime, it will give 1.1 day lifetime for 10,000 upvotes.

This is not a perfect solution but it may help you get started.

halfer
  • 19,824
  • 17
  • 99
  • 186
Subhas kadam
  • 126
  • 1
  • 6
  • It's a very cool idea to use the timestamp as a start to give new posts higher relevance. But this doesn't answer the question, how do you make sure users only see posts they haven't seen yet - eg if they reload the feed or restart the app? – mathematics-and-caffeine May 15 '22 at 22:52
  • If the case is to strictly restrict user from seeing same content then it should be filtered. But if your case is like Instagram where you can still see the old post if there are no new posts in your feed then you can consider this solution. If they reload or restart see point 5. You should know when to resent your index value. – Subhas kadam Jun 30 '22 at 11:10
1

Edit: 11th June 2021

Nowadays, there are two more options that can help you solve such a problem. The first one would be the whereNotEqualTo method and the second one would be whereNotIn. You might choose one, or the other according to your needs.


Seeing your database structure, I can say you're almost there. According to your comment, you are hosting under the following reference:

Users(Collection) -> userId(Document) -> viewed(Collection)

As documents, all the posts a user has seen and you want to get all the post that the user hasn't seen. Because there is no != (not equal to) operator in Firestore nor a arrayNotContains() function, the only option that you have is to create an extra database call for each post that you want to display and check if that particular post is already seen or not.

To achieve this, first you need to add another property under your post object named postId, which will hold as String the actual post id. Now everytime you want to display the new posts, you should check if the post id already exist in viewed collection or not. If it dons't exist, display that post in your desired view, otherwise don't. That's it.


Edit: According to your comments:

So, for the first post to appear, it needs two Server calls.

Yes, for the first post to appear, two database calls are need, one to get post and second to see if it was or not seen.

large number of server calls to get the first post.

No, only two calls, as explained above.

Am I seeing it the wrong way

No, this is how NoSQL database work.

or there is no other efficient way?

Not I'm aware of. There is another option that will work but only for apps that have limited number of users and limited number of post views. This option would be to store the user id within an array in each post object and everytime you want to display a post, you only need to check if that user id exist or not in that array.

But if a post can be viewd by millions of users, storing millions of ids within an array is not a good option because the problem in this case is that the documents have limits. So there are some limits when it comes to how much data you can put into a document. According to the official documentation regarding usage and limits:

Maximum size for a document: 1 MiB (1,048,576 bytes)

As you can see, you are limited to 1 MiB total of data in a single document. So you cannot store pretty much everything in a document.

Alex Mamo
  • 130,605
  • 17
  • 163
  • 193
  • So, for the first post to appear, it needs two Server calls as best case scenario and large number of server calls to get the first post. Am I seeing it the wrong way or there is no other efficient way? – Pavan Varma Nov 06 '18 at 13:26
  • To answer your new questions, please see my updated answer. – Alex Mamo Nov 06 '18 at 13:45
  • in the above comment i mentioned "large number of server calls to get the first post" as worst case scenario – Pavan Varma Nov 06 '18 at 15:02
  • As I said in another comment: The solution in that answer says to make 2 calls to see if it has been seen. Fair enough. But what if the first 30 posts have been seen ? Then you need 60 calls, 60 back and forth ?! The limit seems to be the sky. Especially for lists that switch order so you can't easily predict – Ced Nov 07 '18 at 16:56
  • @Ced As I have already answered to that [comment](https://stackoverflow.com/questions/53194058/firestore-data-model-that-can-filter-by-not-contains-or-similar), you have a different use-case. In this solution, there is only a single extra database call, to check if a particular post has been seen or not. If you think that in your app, a single restaurant cannot be viewd by as many users that a document can hold, then just simply use arrays. – Alex Mamo Nov 07 '18 at 17:03
  • @AlexMamo I wasn't clear enough. For OP here, your solution isn't optimal in the scenario where: User wants to see first trending post => db post fetch + db has_user_seen_post fetch. If the user has seen that post, then the next post is queried, and so on and so on. So if User has seen many of the trending posts it's gonna take a while. Is my question clearer ? – Ced Nov 07 '18 at 17:07
  • @Ced Let's stick to your [question](https://stackoverflow.com/questions/53194058/firestore-data-model-that-can-filter-by-not-contains-or-similar). In this case, you say "For OP here, your solution isn't optimal in the scenario", do you have a better one? I'm affraid that in this scenario, this extra database call is need to solve the problem. If you have a better solution, feel free and add it as an answer. – Alex Mamo Nov 07 '18 at 17:12
  • @Ced If you are not comfortable with this kind of prining model, I recommend you try [Firebase realtime database](https://firebase.google.com/docs/database/usage/billing), it's quite different. – Alex Mamo Nov 07 '18 at 17:15
  • 1
    @AlexMamo I know realtime, I'm just highlighting the problem with your solution. As I see it, at the moment it's just not practical in some scenarios. – Ced Nov 08 '18 at 09:11
  • @Ced I don't consider it as a problem. The fact that Firestore is charging you for every read or write operation, is because of their pricing mechanism. If you need advanced filtering, extra database calls are required. – Alex Mamo Nov 08 '18 at 11:08
  • @Ced Please also don't say that a solution isn't practical since you didn't provide a better one. If you have a solution which implies only a **single** read, please share it with us. I'm more than willing to see it. From my experience, according to the use-case of the app, you can choose whether to use Cloud Firestore or Firebase realtime database but I assure you that both are working well together. – Alex Mamo Nov 08 '18 at 11:09
  • 1
    @AlexMamo We are really not on the same page as either what I'm saying is not correct, or you don't understand what I'm saying. Yes, your solution is the best one available, what I'm saying is that neither realtimeDB nor firestore are suited for this use case because: *If the user has seen the first 60 posts that will results in 122 queries (back and forth between client and server) until the client will see a post*. And that, I hope you will agree, is not acceptable – Ced Nov 08 '18 at 13:50
  • @Ced "If the user has seen the first 60 posts that will results in 122 queries" is not correct. Let's assume that when the user opens the app and the last 60 posts (which are seen) are diplayed. Usually we use a limit(60) call. So if a user has seen the first 60 posts that will result in a single query to get those 60 posts and other 60 queries to check if those post are seen. A total of **61** queries and **120** documents read. Fair enough. – Alex Mamo Nov 08 '18 at 14:19
  • @Ced But this is happening only once. Once you read those posts, second time you get them directly from cache because in Firestore, offline persistent is enabled by default. There is no "back and forth between client and server". Everything is happening locally. So you'll never be charged again for those reads (as long as the content of the posts is the same). – Alex Mamo Nov 08 '18 at 14:19
  • 1
    @AlexMamo My concerns really isn't about price I don't know what makes you think that. My concerns is with potential latency because in my personal use case the user shouldn't see a post that he has already seen and the list is ranked so it's changing order all the time.. With my use case I don't think firestore would be a nice fit. Thanks for taking the time to understand my point though. I just wanted to warn others of potential problems. – Ced Nov 08 '18 at 15:11
  • I understand now. You're welcome. Good luck with your project and if you'll find a suitable solution for that, please share it with the community. Cheers! – Alex Mamo Nov 08 '18 at 16:15
  • How do I check for the array contained in the query? Initially, there will be no array since no user has seen the post but once the post is seen by the user then I can add its id to the array but since initially there is no array it is useless to call -> Query baseQuery = firestore.collection(collectionName).whereArrayContains(); – Pokhraj Sah May 20 '20 at 00:52
  • @PokhrajSah I'm not sure I understand what you mean but I recommend you post a new question using its own [MCVE](https://stackoverflow.com/help/mcve), so I and other Firebase developers can help you. – Alex Mamo May 20 '20 at 08:01
  • Just updated my answer with two more approaches. Hope you reconsider my answer and accept it. I'd really appreciate it. – Alex Mamo Jun 11 '21 at 08:33
  • @AlexMamo, can != or not in be used to query subdocuments? Because storing that data in arrays wouldn't be practical because the data would need to be normalised. –  Nov 11 '22 at 13:03
  • @Hady I didn't quite understand your question. But please post a new question, here on StackOverflow, using its own [MCVE](https://stackoverflow.com/help/mcve), so I and other Firebase developers can help you. – Alex Mamo Nov 14 '22 at 06:41
  • I did and I got an answer. thanks for the help https://stackoverflow.com/questions/74402854/how-to-query-a-document-and-sub-document-together-firestore –  Nov 14 '22 at 10:03