84

I was using Firebase realtime database for my social network app were you can follow and receive posts of people you follow.

My database :

Users
--USER_ID_1
----name
----email
--USER_ID_2
----name
----email

Posts
--POST_ID_1
----image
----userid
----date
--POST_ID_2
----image
----userid
----date

Timeline
--User_ID_1
----POST_ID_2
------date
----POST_ID_1
------date

Another node "Content" contained id of the all the user posts. If "A" followed "B" then all post id's of B where added to A's timeline. And if B posted something it's also added to all of its follower's timelines.

It has scalability issues :

  • If someone has 10,000 followers a new post was added to all 10,000 follower's timelines.
  • If someone has a large amount of posts then every new follower receives all of those in his timeline.

I want to change to Firestore as it's been claimed scalable. How should I structure my database so these problems in realtime database are eliminated in Firestore?

user4157124
  • 2,809
  • 13
  • 27
  • 42
Zicsus
  • 1,115
  • 1
  • 9
  • 19
  • Disclaimer: I only ready through the firestore docs. As firestore has much better querying then firebase-realtime-db, you don't need to copy data anymore. So what I would do is: when a user looks at their timeline, create a firestore query which says `give me all posts which are from the people i follow`. Something like: posts.where(user== john OR mark OR katy OR ...). I expect that something like this works. In case I have time to try it I let you know. – Jürgen Brandstetter Nov 16 '17 at 11:21
  • 3
    @jurgenBrandstetter Firestore does not support 'OR' right now and if it would than also your method won't work. suppose someone have 1000 followers than I have to make 1000 OR statements. – Zicsus Nov 18 '17 at 04:35
  • 1
    I was thinking, may be if you put in your document the id of the person that is following you. For example UserA follows UserB, then in the UserB post document you put UserAID = true. So when you do the query it will be something like == postDocRef.where(UserAID=true), but i dont know if a document in firestore can support up to a n million followers – Giovanny Piñeros May 18 '18 at 19:14
  • @Zicsus Let's assume this example. I made 10 000 posts. Now you are following me. You can order the posts by timestamp and then limit to a certain amount, for example 15 posts and use .childAdded method. In order to load more data in the same chronological order you could create a method with an observer of type: ObserveSingleEvent(ofType:Value) with a limit of 10 posts. Then implement a pull to refresh function in your table view or use the scroll view offset and when you reach the bottom of the table, just call your `ObserveSingleEvent` method and get more items and so on. – bibscy Aug 29 '18 at 18:07
  • @bibscy Question is not about preparing feed if you are following only one person but how to prepare a chronologically feed like twitter where you see activities of all the users you are following. – Zicsus Aug 29 '18 at 18:20

8 Answers8

82

I've seen your question a little later but I will also try to provide you the best database structure I can think of. So hope you'll find this answer useful.

I'm thinking of a schema that has there three top-level collections for users, users that a user is following and posts:

Firestore-root
   |
   --- users (collection)
   |     |
   |     --- uid (documents)
   |          |
   |          --- name: "User Name"
   |          |
   |          --- email: "email@email.com"
   |
   --- following (collection)
   |      |
   |      --- uid (document)
   |           |
   |           --- userFollowing (collection)
   |                 |
   |                 --- uid (documents)
   |                 |
   |                 --- uid (documents)
   |
   --- posts (collection)
         |
         --- uid (documents)
              |
              --- userPosts (collection)
                    |
                    --- postId (documents)
                    |     |
                    |     --- title: "Post Title"
                    |     |
                    |     --- date: September 03, 2018 at 6:16:58 PM UTC+3
                    |
                    --- postId (documents)
                          |
                          --- title: "Post Title"
                          |
                          --- date: September 03, 2018 at 6:16:58 PM UTC+3

if someone have 10,000 followers than a new post was added to all of the 10,000 follower's Timeline.

That will be no problem at all because this is the reason the collections are ment in Firestore. According to the official documentation of modeling a Cloud Firestore database:

Cloud Firestore is optimized for storing large collections of small documents.

This is the reason I have added userFollowing as a collection and not as a simple object/map that can hold other objects. Remember, the maximum size of a document according to the official documentation regarding limits and quota is 1 MiB (1,048,576 bytes). In the case of collection, there is no limitation regarding the number of documents beneath a collection. In fact, for this kind of structure is Firestore optimized for.

So having those 10,000 followers in this manner, will work perfectly fine. Furthermore, you can query the database in such a manner that will be no need to copy anything anywhere.

As you can see, the database is pretty much denormalized allowing you to query it very simple. Let's take some example but before let's create a connection to the database and get the uid of the user using the following lines of code:

FirebaseFirestore rootRef = FirebaseFirestore.getInstance();
String uid = FirebaseAuth.getInstance().getCurrentUser().getUid();

If you want to query the database to get all the users a user is following, you can use a get() call on the following reference:

CollectionReference userFollowingRef = rootRef.collection("following/" + uid + "/userFollowing");

So in this way, you can get all user objects a user is following. Having their uid's you can simply get all their posts.

Let's say you want to get on your timeline the latest three posts of every user. The key for solving this problem, when using very large data sets is to load the data in smaller chunks. I have explained in my answer from this post a recommended way in which you can paginate queries by combining query cursors with the limit() method. I also recommend you take a look at this video for a better understanding. So to get the latest three posts of every user, you should consider using this solution. So first you need to get the first 15 user objects that you are following and then based on their uid, to get their latest three posts. To get the latest three posts of a single user, please use the following query:

Query query = rootRef.collection("posts/" + uid + "/userPosts").orderBy("date", Query.Direction.DESCENDING)).limit(3);

As you are scrolling down, load other 15 user objects and get their latest three posts and so on. Beside the date you can also add other properties to your post object, like the number of likes, comments, shares and so on.

If someone have large amount of posts than every new follower received all of those posts in his Timeline.

No way. There is no need to do something like this. I have already explained above why.

Edit May 20, 2019:

Another solution to optimize the operation in which the user should see all the recent posts of everyone he follow, is to store the posts that the user should see in a document for that user.

So if we take an example, let's say facebook, you'll need to have a document containing the facebook feed for each user. However, if there is too much data that a single document can hold (1 Mib), you need to put that data in a collection, as explained above.

Alex Mamo
  • 130,605
  • 17
  • 163
  • 193
  • @AlexMamo But doing `rootRef.collection("posts").where("uid", "===" , uid)` has the same result, no? (Just trying to understand the why) – ivancoene Mar 18 '19 at 13:50
  • @ivancoene Yes, that's right. That is another way of structuring that collection. – Alex Mamo Mar 18 '19 at 13:53
  • @AlexMamo hey so i appreciate the post and thanks for the answers in the comments so far, we struggle with making an instagram like timeline of all posts of following users ordered by date, i think its the best to store all the posts in one collection and then query according to if the uid is in the following of the current user, the problem is if someone follows 1 million people then I would have to perform 1 million reads to get every single uid? and then the query would be taking too long too because i would have to call it 1 mio times or? thanks in advance – niclas_4 Mar 29 '19 at 17:42
  • @Badgy If a user is following 1 million users, that's fine but in order to display those users, you have to query them using a `limit()` call or load them in smaller chunks. – Alex Mamo Mar 30 '19 at 08:29
  • 24
    Directing users to other semi-relevant posts of yours isn't an answer. I also don't like your approach to retrieving a feed. A typical feed would not return just 15 users' posts. Instead, they look at all posts chronologically. So I don't think it's a good answer, even with its length/detail I think it misses the mark. – Thingamajig Jun 07 '19 at 15:21
  • 1
    simply not efficient. because there are people who keep posting things every minute and there are people who just post once in a year. also, for querying you have to query the users following node which is going to cost you unnecessary reads when you can just put their uid in an array or map as 1MB limit is huge for such data. to be honest firestore can be a trap if you dont optimize your structure to use the whole 1mb limit of document. dont go the easy way. paginate your data in database it self. else it is going to cost you so much – Harkal Nov 28 '19 at 08:22
  • @HarKal As I see from your comment, you have misunderstood my entire answer. Besides that, saying *"simply not efficient"* without providing a better solution, doesn't help anyone. Even if *"there are people who keep posting things every minute and there are people who just post once in a year"* my solution above will work. *"you have to query the users following node which is going to cost you unnecessary reads"* it's not true, will cost you a number of reads that you need to get the users you need. – Alex Mamo Nov 28 '19 at 08:54
  • @DanFein you state in your post "So first you need to get the first 15 user objects that you are following and then based on their uid, to get their latest three posts." This would imply 15 calls to the database no? Or more generally, for a user following n people, n calls. – thedeg123 Apr 30 '20 at 15:19
  • @AlexMamo I like your answer. My question is, if the posts are going to update frequently (like count, view count, etc..) doesn't it costs more? or how to tackle this situation? – Soorya May 06 '20 at 06:29
  • 2
    @Soorya It will always cost the exact number of operations you perform. Nothing more. – Alex Mamo May 06 '20 at 08:18
  • According to your solution, posts are duplicated under each user. So, if I want to update like count of the post, I have to update every document (if I have 10k followers, I've 10k write on each like operation). – Soorya May 07 '20 at 03:00
  • 1
    @Soorya You also check **[this](https://stackoverflow.com/questions/54258303/what-is-denormalization-in-firebase-cloud-firestore/54258505#54258505)** out. – Alex Mamo May 07 '20 at 07:22
  • Nice answer. I like the "Edit May 20, 2019" approach, duplicating posts in a user "feed document/s". I think this is a good solution if the posts don't change much. My concern is when the posts change often, like with "views", "likes", "comments"; updating all duplications could be expensive if it's duplicated in a lot of places (what @Soorya mentions I guess). Another option I can think of is to only save the uid of the post, and do a `get `for each post while the user is scrolling. This could cause some latency, but I can't think of a better solution. I guess it's a trade off. – ernewston May 26 '20 at 03:15
  • 1
    @ernewston Thanks. Yes, it can be a solution as long as it solves your problem. But remember, it's always a trade when it comes to duplication data. You should do some tests and check if it worth it or not. – Alex Mamo May 26 '20 at 09:13
  • 1
    @AlexMamo thanks for the answer. So far this is the best solution. But I calculated the cost for writing to each user’s timeline, it becomes huge. Consider 2K DAU & each active user has 1K followers in average, they follow 10 new people / day & post 10 new / day. Each new people they follow has ~500 posts avg. And each DAU has 1K followers avg. So when they follow new person: 2K*10*500 / day of writes, when they post: 2K*10*1K / day. Then all x30 days. Then minus the free quota, we still need to pay $1,618. Just for write count. it took 80% of billing. Any better way? – Zenko Jul 02 '20 at 20:50
  • @zk_regen That's the best solution for now, that I'm aware of. – Alex Mamo Jul 03 '20 at 09:02
  • so its not possible to do this well with realtime DB? – SuperUberDuper Jul 03 '20 at 22:13
  • @SuperUberDuper What you can, or what you cannot do, it's written in the answer/comments. – Alex Mamo Jul 04 '20 at 05:03
  • I agree with @DanFein, This is not how a social media feed works. You are fetching the 3 most recent posts from all of the followed users, and that does not work. If I follow user A, I want to see all his recent posts, not only 3 of them. Conversely, if I follow an inactive user B, I don't want to see his last 3 posts that date from many years ago. – Antoine Weber Oct 11 '20 at 23:33
  • @AntoineWeber There is no fixed recipe on how a social media feed works. This is only a possible solution. But, what you say can be achieved using this schema. However, if you know a better structure, feel free to post it as an answer. I'm looking forward to seeing it. Otherwise, simply down-voting doesn't add any benefit at all. – Alex Mamo Oct 12 '20 at 08:55
  • `There is no fixed recipe on how a social media feed works` There is a pretty common standard for feeds. When you open you feed, you want to see the most recent posts, at the condition that you follow their authors. If you follow 10 active users and 10 inactive users, you don't want the inactive users take half of the feed. Your method would fetch the posts of the inactive users EVERY time you load the feed. What you did is NOT a feed. What you did is a merge (aggregate) of all of the followed users profile. – Antoine Weber Oct 12 '20 at 14:03
  • @AntoineWeber That's what I was also talking about. Remember, we are always structuring a Firestore database according to the queries that we want to perform. If you have a specific use-case for your app, add it as a new question so I and other Firebase developers can take a look at it. – Alex Mamo Oct 12 '20 at 14:19
  • The question states `how to structure a feed and follow system`. Your reply does not answer the question because you don't create a feed, you create a merge of truncated posts from all the followed users, which is not a feed. I'm not asking you any questions. A better answer has already been posted on this page. – Antoine Weber Oct 12 '20 at 14:56
  • What you create instead is a preview of all the profiles you follow, by showing their 3 latest posts. preview of all profiles != feed. – Antoine Weber Oct 12 '20 at 15:03
  • @AlexMamo hi, your answer seems to be for using with Firestore, but the question is asking for a Firebase Database schema. "I proposed that schema uid->userposts-> postId so you can simply get all the posts that coresponde to a specific user" - in this part can we change it to uid -> postId for firebase database? – Rishin Ali May 19 '21 at 13:51
  • @RishinAli **No**, the question is indeed for Cloud Firestore, as also the selected "google-cloud-firestore" tag shows. – Alex Mamo May 19 '21 at 13:55
  • @AlexMamo *Oh yes*, I can see. I miss understood it. But my question is that, if we are using Realtime DB then can we structure the schema like `uid -> postId` , where all postIds will directly under the uid node. Can't we remove the unnecessary node `userPost` that you used for firestore? – Rishin Ali May 19 '21 at 14:02
  • @RishinAli While both databases are a part of Firebase, both are two different products, with two different mechanisms. So please post a new question using its own [MCVE](https://stackoverflow.com/help/mcve), so I and other Firebase developers can help you. – Alex Mamo May 19 '21 at 14:09
14

There have two situations

  1. Users in your app have a small number of followers.

  2. Users in your app have a large number of followers. If we are going to store whole followers in a single array in a single document in firestore. Then it will hit the firestore limit of 1 MiB per document.


  1. In the first situation, each user must keep a document which stores the followers' list in a single document in a single array. By using arrayUnion() and arrayRemove() it is possible to efficiently manage followers list. And when you are going to post something in your timeline you must add the list of followers in post document.

    And use query given below to fetch posts

    postCollectionRef.whereArrayContains("followers", userUid).orderBy("date");
    
  2. In the second situation, you just need to break user following document based on the size or count of followers array. After reaching the size of the array into a fixed size the next follower's id must add into the next document. And the first document must keep the field "hasNext", which stores a boolean value. When adding a new post you must duplicate post document and each document consist of followers list that breaks earlier. And we can make the same query which is given above to fetch documents.

Niyas
  • 717
  • 11
  • 18
  • @Niyas With this solution if user A follows user B who has B_r reviews, we will perform B_r writes for each follow correct? – thedeg123 Apr 30 '20 at 15:34
  • 5
    You said `you must add the list of followers in post document.` but here's the big issue with that solution: When a user gets a new follower and has 10k posts, you need to update each of those 10k posts to add this new follower in the followers array. 10k writes for 1 follow. That doesn't sound like a good architecture. – Antoine Weber Oct 11 '20 at 02:30
  • 1
    I must say though that this appears to be the best answer even though it's not ideal – Antoine Weber Oct 11 '20 at 15:52
  • 1
    I used your answer as the basis for my application and it has worked wonderfully. I got around the array limit with your suggestion of the 'hasNext' flag and then duplicating the post for every extra document that contains an array of following users. @Antoine I would take the edge case of duplicating 10k posts over duplicating a post for 1m followers – cpboyce Apr 21 '21 at 18:54
14

The other answers are going to get very costly if you have any decent amount of activity on your network (e.g. People following 1,000 people, or people making 1,000 posts).

My solution is to add a field to every user document called 'recentPosts', this field will be an array.

Now, whenever a post is made, have a cloud function which detects onWrite(), and updates that poster's recentPosts array on their userDocument to have info about that post added.

So, you might add the following map to the front of the recentPosts array:

{
"postId": xxxxxxxxxxx,
"createdAt": tttttt
}

Limit the recentPosts array to 1,000 objects, deleting the oldest entry when going over limit.

Now, suppose you are following 1,000 users and want to populate your feed... Grab all 1,000 user documents. This will count as 1k reads.

Once you have the 1,000 documents, each document will have an array of recentPosts. Merge all of those arrays on client into one master array and sort by createdAt.

Now you have up to potentially 1 million post's docIDs, all sorted chronologically, for only 1,000 reads. Now as your user scrolls their feed simply query those documents by their docID as needed, presumably 10 at a time or something.

You can now load a feed of X posts from Y followees for Y + X reads.

So 2,000 posts from 100 followees would only be 2,100 reads.
So 1,000 posts from 1,000 followees would only be 2,000 reads.
etc...


Edit 1) further optimization. When loading the userDocuments you can batch them 10 at a time by using the in query ... normally this would make no difference because it's still 10 reads even though it's batched... but you can also filter by a field like recentPostsLastUpdatedAt and check that it's greater than your cached value for that user doc, then any user docs that haven't updated their recentPosts array will not get read. This can save you theoretically 10x on base reads.

Edit 2) You can attach listeners to each userDocument too to get new posts as their recentPosts change without querying every single follower each time you need to refresh your feed. (Although 1,000+ snapshot listeners could be bad practice, I don't know how they work under the hood) (Edit3: Firebase limits a project to only 1k listeners so edit2 wasn't a scalable optimization)

Albert Renshaw
  • 17,282
  • 18
  • 107
  • 195
  • 4
    This is the most efficient way, so far, to create a chronological timeline of different user's posts. Wish I could upvote 10 times so people know this is a good answer. – Tad Dec 05 '21 at 17:18
  • 1
    @Tadreik Thank you! We were suffering scaling problems and I came up with this. We've used it in production for over a year now with no issues. – Albert Renshaw Dec 07 '21 at 02:41
  • 1
    I like the `recentPostsLastUpdatedAt` addition as well, makes a lot of sense for lowering the user query search space. – LordParsley Jan 07 '22 at 12:22
  • Seems like the best option here in regards to cost optimization. I will try this, but I will experiment with caching the `recentPosts` array to the browsers `localStorage`. Then updating the cache using a query that gets only userdocs' `recentPosts` where `recentPostLastUpdated` is newer than the last "newest" post in the cache. Another thing to experiment with is how often to query for new posts, while the user is on the page. Also, not updating the feed while the tab/page is not active could be helpful I guess... – Spiralis Sep 17 '22 at 00:37
  • Very good solution, I will definitely try that. However, I have a question. You said "When loading the userDocuments you can batch them 10 at a time by using the in query ... normally this would make no difference because it's still 10 reads even though it's batched... ". Could you explain this further? I'm not sure what you mean by that? Thanks :) – George Nov 04 '22 at 10:55
  • @George the `in` query allows you to batch up to 10 document IDs in a single read "request" but firebase still charges the query as 10 "reads" in your usage billing—so normally there is no cost benefit to batching in this manner; however if there is a field on the docs called `recentPostsLastUpdatedAt` we can filter the batched `in` query by that field via `greaterThan:` of your last cached MIN() `lastQueriedAt` of those 10 docs, which will filter out non-updated docs, which won't count as reads. Because of this, you can filter out up to 9 out of 10 of your reads for each batched `in` query. – Albert Renshaw Nov 04 '22 at 16:47
  • Sounds great. Would you mind showing a code snippet of that query? – George Nov 04 '22 at 21:57
  • @George Something like `query(usersRef, where(‘userID’, 'in', [FOLLOWEE_ID_1, FOLLOWEE_ID_2, …]), where("recentPostsLastUpdatedAt", ">", LAST_QUERIED_AT))` where LAST_QUERIED_AT is the oldest of the cached query dates of those 10 followee's user docs – Albert Renshaw Nov 04 '22 at 23:14
  • @AlbertRenshaw Very nice solution. I have a question: Why not store the createdAt and createdBy fields directly on the Post document and then query the Post collection instead? The query could return Post documents for specific userIds where createdAt is greater than a specific Timestamp (depends on how recent we want the Posts in the feed to be). There wouldn't be a need for recentPosts field in the User document in this case. – wonderingdev Dec 29 '22 at 21:29
  • @IvanGandacov This would be too many reads, you'd be reading each post document itself rather than getting a list that's a reference to the post documents. In my example I can get an ordered list of, say, 1,000,000 post docIDs (sorted) then query each doc as they scroll, with your scenario I'd have to read the post docs first which would be very expensive. For this reason a reference list of docIDs (@ 1 read per followee) is better than directly querying the docs themselves (potentially 1000 reads per followee) – Albert Renshaw Dec 30 '22 at 01:12
  • @AlbertRenshaw Why would my scenario be expensive? Firestore charges per document returned (1 doc return, 1 read is charged). My proposed solution would return only the Post docs that have a createdAt greater than a specific Timestamp, for specific userIds and with a limit of, let's say, 50 Post docs per query. The next query can be run with a start cursor pointing to the last document snapshot returned by the previous query run. So each query run will return maximum 50 Posts, so maximum 50 reads charged. – wonderingdev Dec 30 '22 at 11:34
  • 1
    @IvanGandacov I may have misunderstood what you’re asking. If I’m following say 2500 users the most userIDs you can batch at a time for your queries is 10; so that’s 250*50 reads (as opposed to 50) which isn’t awful, but what happens if I follow someone who just posted 50 posts more recently than the other 9 people in that arbitrary batch, but those 9 people’s posts are still more recent than some of the other batches? You’ll end up having many missing posts in your feed based on arbitrary user activity in each batch. – Albert Renshaw Dec 30 '22 at 18:25
  • @AlbertRenshaw Yes, indeed, it makes sense. I forgot that the limit for IN queries is only 10 items. With this, I can now see that there would be missing posts in the feed, as we cannot query the Posts for all the userIds at once. So my scenario won't work. Now I understand why you proposed the recentPosts approach in your solution. Thank you for taking your time to respond to my comments! A big upvote from me on your solution! – wonderingdev Dec 30 '22 at 19:24
  • 1
    @IvanGandacov sure thing! And thanks for your input as well, I’m sure there are better ways to structure these things so it’s important that we keep questioning them – Albert Renshaw Dec 30 '22 at 20:59
5

I've been struggling bit with the suggested solutions her, mostly due to a technical gap, so i figured another solution that works for me.

For every user I have a document with all the accounts that they follow, but also all a list of all the accounts that follow that user.

When the app starts, I get a hold of the list of accounts that follow this current user, and when a user makes a post, part of the post object is the array of all the users that follow them.

When user B wants too get all the posts of the people they are following, i just ad to the query a simple whereArrayContains("followers", currentUser.uid).

I like this approach because it still allows me to order the results by any other parameters I want.

Based on:

  • 1mb per document, which by a google search I've made seems to hold 1,048,576 chaarecters.
  • The fact that Firestore generated UIDs seem to be around 28 characters long.
  • The rest of the info in the object doesn't take too much size.

This approach should work for users that have up to approx 37,000 followers.

Tsabary
  • 3,119
  • 2
  • 24
  • 66
  • 2
    I won't recommended using this approach. The document also have a limit of 20k lines in side it. It means you cannot have an array above 19999k size where as 1 line is for array name. This also means u cannot add any other fields in a document as the limit is reached – Sandeep May 28 '20 at 08:06
  • @Sandeep I think there's 20k field limit, not line. Array type data are considered as one field. – Pooja Nov 21 '20 at 06:46
  • @Pooja please double check because I am quite sure it's about lines. Share your finding please – Sandeep Nov 21 '20 at 07:07
1

I went through some of the the Firebase documentation, and I'm confused as to why the suggested implementation at https://firebase.google.com/docs/database/android/structure-data#fanout wouldn't work in your case. Something like this:

users
--userid(somedude)
---name
---etc
---leaders: 
----someotherdude
----someotherotherdude

leaders:
--userid(someotherdude)
---datelastupdated
---followers
----somedude
----thatotherdude
---posts
----postid

posts
--postid
---date
---image
---contentid

postcontent
--contentid
---content

The guide goes on to mention "This is a necessary redundancy for two-way relationships. It allows you to quickly and efficiently fetch Ada's memberships, even when the list of users or groups scales into the millions.", so it doesn't seem that scalability is exclusively a Firestore thing.

Unless I'm missing something the main problem seems to be the existence of the timeline node itself. I get that it makes it easier to generate a view of a particular user's timeline, but that comes at the cost of having to maintain all of those relationships and is significantly delaying your project. Is it too inefficient to use queries to build a timeline on the fly from a structure similar to the above, based on a submitted user?

1

Update: 3/7/23

This post is out of date. I personally believe you should go with Mass Duplication, which is possible even with the limits in Firestore Functions. I have put together every possible version in an updated article.

https://code.build/p/GNWh51AdUxUd3B8vEnSMam/building-a-scalable-follower-feed-with-firestore


Original Post


My scalable idea is that users may have 1,000,000+ followers, but a REAL user does not follow more than 1000 people. We could simply aggregate their feed (a collection of posts).

Collections

/users
/users/{userId}/follows
/users/{userId}/feed
/posts

1. Populate the feed

Populate feed needs to run first, and should honestly be in a cloud function. To avoid costs, it will only get new posts to your feed, but not posts older than 10 days (or however old).

populateFeed() - something like this...

numFollowing = get('users/numFollowing');
lastUpdate = get('users/lastUpdate');
tenDaysOld = timestamp 10 days ago

// maybe chunk at 20 here...
for (numFollowing) {
  docs = db.collection('posts')
    .where('userId', '==', userId)
    .where('createdAt', '>', lastUpdate)
    .where('createdAt', '<', tenDaysOld);
  db.collection('users/${userId}/feed').batch.set(docs);

Update users/${userId}/lastUpdate to current timestamp.

This way, you don't get too many documents (only 10 days old for example), and you don't waste reads on docs you already have.

2. Read the feed

A feed will be the aggregated posts.

loadFeed() - call this after populateFeed()

db.collection('/users/${userId}/feed').orderBy('createdAt');

The documents in feed only really need the createdAt date and postId since you can pull the post on the front end, although you could store all data if you don't expect it to change:

postId: {
  createdAt: date
}

Your userDoc will also have:

{
  numFollowing: number,
  lastUpdate: date
}

The app should automatically call loadFeed() on load. There could be a button that runs populateFeed() as a callable cloud function (the best), or locally. If your feed is a firebase observable, it will update automatically as they populate.

There might be some other cleaner ways to solve this problem that scale. It is possible to update the field on a posts onWrite to all followers feed. The only constraint is time, which, although normally in the 60s, can be up to 9min. Just make sure you bulk update asynchronously.

See my adv-firestore-functions package here.

J

Jonathan
  • 3,893
  • 5
  • 46
  • 77
  • Please don't insert "EDIT"s/"UPDATE"s, just make your post the best presentation as of edit time. Please avoid social & meta commentary in posts. – philipxy Mar 08 '23 at 03:59
  • 1
    @philipxy - Please to do edit my post and change the order. There are literally 3 other articles on this page that have posted updates to their post the same way that I did. I don't care about my presentation, only to offer the best up to date information to people searching for an answer to the problem. I'm not going to repost a 15 minute article here when I can simply add a link. – Jonathan Mar 09 '23 at 02:46
1

Alright after some thinking about this problem I came up with a theoretical solution (because I didn't test it yet). I will be using Cloud Firestore for this:

My Solution is compromised of two parts :

1. Database Shema design :

Firestore-root
     |
      _ _ users (collection):
               |
                _ _ uid (document):
                       |
                        _ _ name: 'Jack'
                       |
                        _ _ posts (sub-collection):
                                 |
                                  _ _ postId (document)
                       |
                        _ _ feed (sub-collection):
                                |
                                 _ _ postId (document)
                       |
                        _ _ following (sub-collection):
                                     |
                                      _ _ userId (document)
                       |
                        _ _ followers (sub-collection):
                                     |
                                      _ _ userId (document)

1.1 Explanation:

As you can see here, I have created a collection named users representing each user in the database. Each uid document in the users collection has it's own fields like name for example and it's own sub-collections. Each uid document contains it's own created posts in the posts sub-collection, it contains the posts from the people the current user follows in the feed sub-collection. Finally it contains two sub-collections representing the following and followers.

2. Use Cloud Functions:

const functions = require("firebase-functions");

const firebaseAuth = require("firebase/auth");

const admin = require("firebase-admin");

admin.initializeApp();

const firestore = admin.firestore();

const uid = firebaseAuth.getAuth().currentUser.uid;

exports.addToUserFeed = 
  
  functions.firestore.document("/users/{uid}/posts/{postId}").onCreate(async 
  (snapshot,context) => {

    const userId = context.params.uid;

    const followers = await firestore.collection('users').doc(userId).collection('followers').where("userId", "==", uid).get();

    const isFollowing = !followers.empty;

    if (isFollowing == true) {

        const docRef = 
        firestore.collection('users').doc(uid).collection('feed').doc();

        const data = snapshot.data();

        firestore.runTransaction(transaction => {
           transaction.create(docRef, data);
       });
    }
});

2.1 Explanation:

Here we trigger a cloud function whenever a user creates a post in its sub-collection posts. Since we want to add the posts to the current users feed (feed sub-collection), from the users it is following, we check first whether the current user (which we got its id using firebase auth in form of uid constant) follows the created post author which its id is stored in the wildcard uid (We can access it through context.params.uid). The Checking is done through performing a Query to check if any of the userId documents in the followers sub-collection matches the current user id uid. This returns a QuerySnapshot. Then we check if the QuerySnapshot is empty or not. If it is empty that means that the current user doesn't follow the context.params.uid user. Otherwise it does follow it. If it does follow, then we add the newly created post into the current users feed sub-collection using a transaction.

Alright thats it. I hope this helps anyone. Again I didn't test it yet, so maybe something can not work out, but hopefully it will. Thanks!

Nader Khaled
  • 145
  • 2
  • 8
  • what will happen if a user newly followed and they cant see their old posts on feed right ? how to overcome that issue ? – Joel Jerushan Jan 19 '23 at 09:46
0

I think one possibility is to make another top-level collection named "users_following" which contains a document named "user_id" and a field for an array that contains all the users that the user is following. Within that "users_following" document one can have sub-collection of that particular user all posts or a top-level collection will also do the job. The next important thing that comes is that one has to store a recent one post inside "users-following" document as an array or map. Basically this normalized data is going to be used to populate the feed of the person who is following you. But its drawback is that you will only see one post per person even if the person has added two posts recently or even if you store your two to three posts in the normalized way than your all three posts will be shown at once (like three posts of the same user in a row). But it's something still good if you just need to show one post per user.

Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38