How to implement the activity stream in a social network

Question

I'm developing my own social network, and I haven't found on the web examples of implementation the stream of users' actions... For example, how to filter actions for each users? How to store the action events? Which data model and object model can I use for the actions stream and for the actions itselves?

good luck, this is the never ending question that we all want to know, how does facebook pull it off, the answer is very complex and we may never know the most efficient way of doing it. If you find a GOOD approach, please post it here for others to view, BTW this has been discussed many many time on SO so just search and you will find some tips — JasonDavis, Sep 20 '09 at 21:53
Stream Framework is the most widely used solution: https://github.com/tschellenbach/Stream-Framework Also see this listing of packages: https://www.djangopackages.com/grids/g/activities/ — Thierry, Oct 16 '14 at 11:21
In terms of personalization it's based on analytics and machine learning, Also see http://getstream.io/personalization/ — Thierry, Jan 27 '16 at 20:24

outcassed · Accepted Answer · 2015-03-10T18:36:01.840

262

Summary: For about 1 million active users and 150 million stored activities, I keep it simple:

Use a relational database for storage of unique activities (1 record per activity / "thing that happened") Make the records as compact as you can. Structure so that you can quickly grab a batch of activities by activity ID or by using a set of friend IDs with time constraints.
Publish the activity IDs to Redis whenever an activity record is created, adding the ID to an "activity stream" list for every user who is a friend/subscriber that should see the activity.

Query Redis to get the activity stream for any user and then grab the related data from the db as needed. Fall back to querying the db by time if the user needs to browse far back in time (if you even offer this)

I use a plain old MySQL table for dealing with about 15 million activities.

It looks something like this:

id             
user_id       (int)
activity_type (tinyint)
source_id     (int)  
parent_id     (int)
parent_type   (tinyint)
time          (datetime but a smaller type like int would be better)

activity_type tells me the type of activity, source_id tells me the record that the activity is related to. So if the activity type means "added favorite" then I know that the source_id refers to the ID of a favorite record.

The parent_id/parent_type are useful for my app - they tell me what the activity is related to. If a book was favorited, then parent_id/parent_type would tell me that the activity relates to a book (type) with a given primary key (id)

I index on (user_id, time) and query for activities that are user_id IN (...friends...) AND time > some-cutoff-point. Ditching the id and choosing a different clustered index might be a good idea - I haven't experimented with that.

Pretty basic stuff, but it works, it's simple, and it is easy to work with as your needs change. Also, if you aren't using MySQL you might be able to do better index-wise.

For faster access to the most recent activities, I've been experimenting with Redis. Redis stores all of its data in-memory, so you can't put all of your activities in there, but you could store enough for most of the commonly-hit screens on your site. The most recent 100 for each user or something like that. With Redis in the mix, it might work like this:

Create your MySQL activity record
For each friend of the user who created the activity, push the ID onto their activity list in Redis.
Trim each list to the last X items

Redis is fast and offers a way to pipeline commands across one connection - so pushing an activity out to 1000 friends takes milliseconds.

For a more detailed explanation of what I am talking about, see Redis' Twitter example: http://redis.io/topics/twitter-clone

Update February 2011 I've got 50 million active activities at the moment and I haven't changed anything. One nice thing about doing something similar to this is that it uses compact, small rows. I am planning on making some changes that would involve many more activities and more queries of those activities and I will definitely be using Redis to keep things speedy. I'm using Redis in other areas and it really works well for certain kinds of problems.

Update July 2014 We're up to about 700K monthly active users. For the last couple years, I've been using Redis (as described in the bulleted list) for storing the last 1000 activity IDs for each user. There are usually about 100 million activity records in the system and they are still stored in MySQL and are still the same layout. These records let us get away with less Redis memory, they serve as the record of activity data, and we use them if users need to page further back in time to find something.

This wasn't a clever or especially interesting solution but it has served me well.

edited Mar 10 '15 at 18:36

answered Nov 19 '09 at 20:42

outcassed

5,223
2
27
24

2

+1 for Redis. v2 uses virtual memory so it should be possible to rely entirely on Redis – stagas May 20 '10 at 19:44
17

If there is multiple source of activity (add, comment, like, etc.), how do you join this table with actual activities? Do you use multiple left join (each for an activity table)? – Ali Shakiba Jan 14 '11 at 22:54
casey i wanted to add reply comment on activity and show it under it, how is it possible with your structure? should i add another table or just use same, if same, then what are your suggestions? – Basit Feb 03 '11 at 05:58
@Basit Comments should be easier since looking them up will be fast. You could just store comments in a separate table and pull the ones you need once you've found the activities that you want to display. – outcassed Feb 05 '11 at 05:47
would'nt it will be slow to execute comment query each time for each activity, just to see if user made comment and if he did, then what are thos..? – Basit Feb 06 '11 at 23:19
@Basit You could pull all the comments you want with one query... or if you want something Facebook-style, track the total comment count and the ids for a set of "featured comments" as well. – outcassed Feb 07 '11 at 17:52
hmm.. ok so you suggesting i should keep the count in activity table and also put 2 or 3 comment ids in the activity table for quick pull.. is that right? btw what do you mean by "featured comments" how can one select which one is "featured comments"? i really appreciate your replies.. :) – Basit Feb 08 '11 at 08:39
1

@casey Echoing @JohnS' question - how do you perform the `JOIN` on the various `activity_type` tables? Are those joins expensive performance-wise? – Rob Sobers Jul 22 '11 at 20:53
Hey this is super late but how exactly do you trigger the actual logging of the activities? Do you use MySQL Triggers or simply do multiple simultaneous inserts e.g. user comments on a photo -> new record is inserted in both comments table and activity table – Ray Feb 08 '12 at 02:35
And if your redis get down or restart? You will lose all the list of notifications in redis? Do you persist redis or save the list of each friend of the user who created the activity in mysql? – Luccas Mar 04 '12 at 23:43
What if you want to include photos in the feed (say the user's photo); do you store the photo link as one of the data items, or do you grab the user's photo from the db once you get his ID (I expect the former but really curious) – Laurent Apr 27 '13 at 06:33
@casey what is the current status of your web site? Is this still good way to make activity feed (without redis)? – 1110 Jan 03 '14 at 15:40
Echoing @JohnS' question - I also wanted to get an idea of how this activity table looks like and how do you join the different activities? – Nihal Sharma Jan 30 '15 at 13:24
@casey so if I make replay and after that remove it, what should I do? I should remove first activity from database or I should create two activities and merge them after getting? What is the best practice? Thanks for answer. – Ilya Demidov Feb 28 '15 at 15:21
1

Has anyone got answer to JohnS question about the "JOIN". Can anyone post a link where it might be explained ? I have to do similar thing and it would be very helpful to me. – Waseem May 11 '15 at 15:33
3

No joins. One query per unique `activity_type` to get the other data that you need. – outcassed Jun 23 '15 at 20:36
What data type do you store this table in Redis? How about using MEMORY storage engine instead of Redis? – Terry Lin Aug 30 '16 at 12:28
I'm happy to get here and see your post. I'm using almost the exact same database structure for my user's timeline. And currently going to implement groups where there will be group activities. I assume u use group activities under your parent/parent_id. What I'm curious is how is your redis structured? Can share some light? – Someone Special Feb 13 '17 at 07:19
@casey thanks for ur explanation...i am really stuck trying to implement this solution. how do i achieve this "For each friend of the user who created the activity, push the ID onto their activity list in Redis". i am assuming i would have to do this in a loop. but what if the user has about 100000 friends. it would really take long to loop through the list. – Ewomazino Ukah May 08 '17 at 08:23
@casey What is your Redis server specs ? – AliN11 Nov 28 '17 at 08:22
Are you doing aggregations of events of the same kind? This paper http://jeffterrace.com/docs/feeding-frenzy-sigmod10-web.pdf describes it but doesn't provide an implementation no *how* to aggregate them. – floriank Jan 28 '18 at 18:43
I have a few questions about your redis implementation What type of structure do you use in redis? I was thinking to create an object with the user id as key and for the object, value wanted to use a list of either the activity key or the complete activity as json? – Jose Manuel Ojeda Jun 21 '18 at 07:50
Isn't source_id the same as parent_id here? If you favorite a book, then the source_id is the book_id. If the parent_id is for the related object, then it is also book_id and the type is (let's say) the name of the class. When would it be different? Can source_id be nil? – Strawberry Jun 27 '18 at 05:34

score 22 · Answer 2 · answered Nov 22 '09 at 22:28

This is my implementation of an activity stream, using mysql. There are three classes: Activity, ActivityFeed, Subscriber.

Activity represents an activity entry, and its table looks like this:

id
subject_id
object_id
type
verb
data
time

Subject_id is the id of the object performing the action, object_id the id of the object that receives the action. type and verb describes the action itself (for example, if a user add a comment to an article they would be "comment" and "created" respectively), data contains additional data in order to avoid joins (for example, it can contain the subject name and surname, the article title and url, the comment body etc.).

Each Activity belongs to one or more ActivityFeeds, and they are related by a table that looks like this:

feed_name
activity_id

In my application I have one feed for each User and one feed for each Item (usually blog articles), but they can be whatever you want.

A Subscriber is usually an user of your site, but it can also be any object in your object model (for example an article could be subscribed to the feed_action of his creator).

Every Subscriber belongs to one or more ActivityFeeds, and, like above, they are related by a link table of this kind:

feed_name
subscriber_id
reason

The reason field here explains why the subscriber has subscribed the feed. For example, if a user bookmark a blog post, the reason is 'bookmark'. This helps me later in filtering actions for notifications to the users.

To retrieve the activity for a subscriber, I do a simple join of the three tables. The join is fast because I select few activities thanks to a WHERE condition that looks like now - time > some hours. I avoid other joins thanks to data field in Activity table.

Further explanation on reason field. If, for example, I want to filter actions for email notifications to the user, and the user bookmarked a blog post (and so he subscribes to the post feed with the reason 'bookmark'), I don't want that the user receives email notifications about actions on that item, while if he comments the post (and so it subscribes to the post feed with reason 'comment') I want he is notified when other users add comments to the same post. The reason field helps me in this discrimination (I implemented it through an ActivityFilter class), together with the notifications preferences of the user.

Nicolo martini i wanted to add reply comment on activity and show it under it, how is it possible with your structure? should i add another table or just use same, if same, then what are your suggestions? — Basit, Feb 03 '11 at 05:59
How is performance of this implementation? Any tests on large tables? — Joshua F. Rountree, Mar 19 '12 at 13:46

score 17 · Answer 3 · answered Feb 14 '12 at 14:48

17

There is a current format for activity stream that is being developed by a bunch of well-know people.

http://activitystrea.ms/.

Basically, every activity has an actor (who performs the activity), a verb (the action of the activity), an object (on which the actor performs on), and a target.

For example: Max has posted a link to Adam's wall.

Their JSON's Spec has reached version 1.0 at the time of writing, which shows the pattern for the activity that you can apply.

Their format has already been adopted by BBC, Gnip, Google Buzz Gowalla, IBM, MySpace, Opera, Socialcast, Superfeedr, TypePad, Windows Live, YIID, and many others.

answered Feb 14 '12 at 14:48

Sơn Trần-Nguyễn

2,188
1
26
30

hi @sntran I know this post was years ago, but I have a question more about activity stream. Is there a way you can help out? – hiswendy Jul 05 '17 at 19:16
Sure. What is your question? – Sơn Trần-Nguyễn Jul 06 '17 at 19:27
My question is actually posted here! [link](https://stackoverflow.com/questions/44900776/transforming-json-from-api-into-an-activity-stream?noredirect=1#comment76845862_44900776). I think I have a basic understanding of activity stream, but I'm really not so sure how to implement it (i.e am I supposed to use angular or node.js?) And from there, how do I actually CREATE an activity stream with incoming API JSON? These are such basic questions, but I couldn't find any answers online. If you can help out, I would truly appreciate it. Thank you! – hiswendy Jul 07 '17 at 23:25

score 13 · Answer 4 · edited May 23 '17 at 12:26

I think that an explanation on how notifications system works on large websites can be found in the stack overflow question how does social networking websites compute friends updates?, in the Jeremy Wall's answer. He suggests the use of Message Qeue and he indicates two open source softwares that implement it:

See also the question What’s the best manner of implementing a social activity stream?

score 1 · Answer 5 · answered Apr 05 '12 at 00:47

You absolutely need a performant & distributed message queue. But it does not end there, you'll have to make decisions on what to store as persistent data and what as transient and etc.

Anyway, it is really a difficult task my friend if you are after a high performance and scalable system. But, of course some generous engineers have shared their experience on this. LinkedIn lately made its message queue system Kafka open source. Before that, Facebook had already provided Scribe to the open source community. Kafka is written in Scala and at first it takes some time to make it run but i tested with a couple of virtual servers. It is really fast.

http://blog.linkedin.com/2011/01/11/open-source-linkedin-kafka/

http://incubator.apache.org/kafka/index.html

score 0 · Answer 6 · answered Jun 19 '13 at 02:47

Instead of rolling your own, you could look to a third party service used via an API. I started one called Collabinate (http://www.collabinate.com) that has a graph database backend and some fairly sophisticated algorithms for handling large amounts of data in a highly concurrent, high performance manner. While it does not have the breadth of functionality that say Facebook or Twitter do, it more than suffices for most use cases where you need to build activity streams, social feeds, or microblogging functionality into an application.

How to implement the activity stream in a social network

6 Answers6

Linked