Say I have a general website that allows someone to download their feed
in a small amount of time. A user can be subscribed
to many different pages
, and the user's feed
must be returned to the user from the server with only N
of the most recent posts
between all of the pages subscribed to. Originally when a user queried the server for a feed
, the algorithm was as follows:
- look at all of the
pages
a usersubscribed
to - getting the
N
most recentposts
from eachpage
- sorting all of the
posts
- return the
N
most recent posts to the user as theirfeed
As it turns out, doing this EVERY TIME a user tried to refresh a feed was really slow. Thus, I changed the database to have a table of feedposts
, which simply has a foreign key to a user and a foreign key to the post. Every time a page makes a new post, it creates a feed post for each of its subscribing followers. That way, when a user wants their feed, it is already created and does not have to be created upon retrieval.
The way I am doing this is creating far too many rows and simply does not seem scalable. For instance, if a single page makes 1 post & has 1,000,000 followers, we just created 1,000,000 new rows in our feedpost table.
Please help! How do companies such as facebook handle this problem? Do they generate the feed upon request? Are my database relationships terrible?