16

I'm planning on creating a social network and I don't think I quite understand how the status update module of facebook is designed. Hoping I can find some help here. At algorithmic and datastructure level, what is the most efficient way to create a status update mechanism in a social network?

A full table scan for all friends and then sorting their updates is very naive and costly. Do we use some sort of mechanism based on hashing or something else? Please let me know.

P.S: I'm not talking about their EdgeRank algorithm but the basic status update. How do they find and fetch them from the database?

Thanks in advance for the help!

Zsolt Safrany
  • 13,290
  • 6
  • 50
  • 62
Ari53nN3o
  • 1,202
  • 2
  • 14
  • 21
  • https://stackoverflow.com/questions/1443960/how-to-implement-the-activity-stream-in-a-social-network – OhadR Jan 10 '18 at 15:22

1 Answers1

25

Here is a great presentation that answers your question. The specific answer comes up at around minute 55:40, but I suggest that you watch the entire presentation to understand how the solution fits into the entire architecture.

In short:

  1. A particular server ("leaf") stores all feed items for a particular user. So data for each of your friends is stored entirely at a specific destination.
  2. When you want to view your news feed, one of the aggregator servers sends request to all the leaf servers for your friends and ranks the results. The aggregator knows which servers to send requests to based on the userid of each friend.

This is terribly simplified, of course. This only works because all of it is memcached, the system is designed to minimize latency, some ranking is done at the leaf server that contains the friend's feed items, etc.

You really don't want to be hitting the database for any of this to work at a reasonable speed. FB use MySql mostly as a key-value store; JOINing tables is just impossible at their scale. Then they put memcache servers in front of the databases and application servers.

Having said that, don't worry about scaling problems until you have them (unless, of course, you are worrying about them for the fun of it.) On day one, scaling is the least of your problems.

Nick Zalutskiy
  • 14,952
  • 7
  • 53
  • 50
  • Hi Nick, that was an insightful presentation though a little overwhelming for my knowledge base! Thank you very much for the link. Please excuse me for the naivety of my follow up questions. But how do I, at the lowest table cell level, visualize a "leaf" server and an "aggregate" server. 1 leaf and aggregate server dedicated to each user on the social network? – Ari53nN3o Aug 16 '11 at 09:02
  • 3
    Imagine a huge database table with two columns: id, data. They use [sharding](http://en.wikipedia.org/wiki/Sharding) to split this table based on id. So ids 1-1000 will reside on server1, ids 1001-2000 will reside on server2, etc. Each one of these servers is what FB call a "leaf." (ie. a shard) Now if you want to do a SUM(), for example, of something with id 30 and something with id 1030, you can't because they live on different servers. That's where one of the aggregator server comes in. It goes to both leaf servers and fetches the rows. Then it performs the SUM() and returns the result. – Nick Zalutskiy Aug 16 '11 at 13:39
  • 6
    Tread carefully, because building to scale at this point in the game may do you more harm than good by getting you into bad habits (like storing everything in a key value store and not taking advantage of JOINs.) It all depends on what you want to learn. FB have very particular needs due to their scale. However, when they started, they were using a single MySQL database server with many tables, with many columns, and joining tables for each request, just like everybody else. For 99 project out of 100, this is still the way to go. – Nick Zalutskiy Aug 16 '11 at 13:50
  • Thanks a ton for the explanation, Nick. That sure cleared a lot of doubt for me. – Ari53nN3o Aug 17 '11 at 07:20