271

I'm interested in hearing your opinions in which is the best way of implementing a social activity stream (Facebook is the most famous example). Problems/challenges involved are:

  • Different types of activities (posting, commenting ..)
  • Different types of objects (post, comment, photo ..)
  • 1-n users involved in different roles ("User x replied to User y's comment on User's Z post")
  • Different views of the same activity item ("you commented .." vs. "your friend x commented" vs. "user x commented .." => 3 representations of a "comment" activity)

.. and some more, especially if you take it to a high level of sophistication, as Facebook does, for example, combining several activity items into one ("users x, y and z commented on that photo"

Any thoughts or pointers on patterns, papers, etc on the most flexible, efficient and powerful approaches to implementing such a system, data model, etc. would be appreciated.

Although most of the issues are platform-agnostic, chances are I end up implementing such a system on Ruby on Rails

Jon Seigel
  • 12,251
  • 8
  • 58
  • 92

13 Answers13

146

I have created such system and I took this approach:

Database table with the following columns: id, userId, type, data, time.

  • userId is the user who generated the activity
  • type is the type of the activity (i.e. Wrote blog post, added photo, commented on user's photo)
  • data is a serialized object with meta-data for the activity where you can put in whatever you want

This limits the searches/lookups, you can do in the feeds, to users, time and activity types, but in a facebook-type activity feed, this isn't really limiting. And with correct indices on the table the lookups are fast.

With this design you would have to decide what metadata each type of event should require. For example a feed activity for a new photo could look something like this:

{id:1, userId:1, type:PHOTO, time:2008-10-15 12:00:00, data:{photoId:2089, photoName:A trip to the beach}}

You can see that, although the name of the photo most certainly is stored in some other table containing the photos, and I could retrieve the name from there, I will duplicate the name in the metadata field, because you don't want to do any joins on other database tables if you want speed. And in order to display, say 200, different events from 50 different users, you need speed.

Then I have classes that extends a basic FeedActivity class for rendering the different types of activity entries. Grouping of events would be built in the rendering code as well, to keep away complexity from the database.

heyman
  • 4,845
  • 3
  • 26
  • 19
  • 1
    this is a really great system. I assume that you are creating the feed database entries at the same time you actually performing the action, for example, creating a new comment event entry in the feed table at the same time the user submits the comment – goddamnyouryan Sep 17 '10 at 03:23
  • 3
    Yep, that's correct. Lately I've been using MongoDB (http://mongodb.org) in a few projects, whose schemaless approach makes it very suitable for creating a well performing social activity stream that follows this design. – heyman Sep 17 '10 at 08:40
  • Wait, but you have userID:1, you'll still need a join to grab the user name? – AnApprentice Sep 30 '10 at 02:13
  • 6
    TheApprentice: Yep, you might want to throw in a username field as well. In our system, we only displayed events generated by a user's friends, and I believe we already had a map of the friends' userid->username in memory, so looking up the usernames didn't require a JOIN and were fast. – heyman Oct 07 '10 at 09:33
  • heyman i wanted to add reply comment on activity and show it under it, how is it possible with your structure? should i add another table or just use same, if same, then what are your suggestions? – Basit Feb 03 '11 at 06:00
  • I think the most interesting part of this implementation is to "mark" a record as it is read. For example, how you will note an activity that is older or already seen by the user? I am not sure how Facebook implements this – asyncwait Feb 12 '11 at 10:51
  • If a user changes his name, this approach wouldn't work would it? I would like to see my avatar update in my stream when I do so. – Mike Flynn Mar 25 '11 at 00:01
  • Basit: I would create a separate table for the comments, and then probably denormalize comment count in the feed table so that the number of comments can be shown for each item without needing to do a JOIN. – heyman Mar 25 '11 at 09:11
  • 1
    asyncwait: When a user views the activity stream you could save the current time to the user or user session. Then when you fetch the feed items next time, you can easily determine which items are new. – heyman Mar 25 '11 at 09:15
  • Mike Flynn: That depends. If you denormalize the user info in the feed items, then a change of the user's data would not be reflected in the feed. How ever, in our case we had a map of the friend's userid->user objects which we used when we rendered username and avatar, so user info changes was reflected in the feed. – heyman Mar 25 '11 at 09:34
  • this would only work in a document based database right? not something like mysql or postgresql. – Omnipresent Jun 17 '11 at 04:06
  • is there a gem or plugin which uses this architecture? – Satchel Jun 23 '11 at 06:29
  • Omnipresent: Nope, this approach would work with a relational database as well. However, you would need to serialize the meta data in the data field yourself. Since the meta data is serialized, you wouldn't be able to do queries on anything put into the data field. – heyman Jul 05 '11 at 14:52
  • Angela: This is a language agnostic solution. I don't know if there are any ruby gems that provide social activity stream features that are implemented using an approach similar to this. – heyman Jul 05 '11 at 14:58
  • @heyman How would you go about adding privacy to an activity stream like this? Also, how can an activity record be discarded on a per-user basis? – Lea Hayes Sep 07 '11 at 03:33
  • Intresting thing to read even tho it was posted a while ago i can see how this works with smaller data sets but it will start to slow down as time goes on very quickly and needs a facelift to give it real speed. – WojonsTech Jan 21 '12 at 10:42
  • @heyman I completely understand why you should have the data column there, but the only problem that I can think of with this kind of denormalization is what would happen if photo 2089 was deleted or the name was changed? – mobius Jan 30 '12 at 10:21
  • 2
    You would have to handle that case manually. It's probably best to do it when the photo gets deleted (find the feed item in the user's feed, and delete/update it). – heyman Feb 01 '12 at 20:33
  • If a photo's name was changed, I will have to fetch all the activities for "data:{photoId:2089}" and change one by one? Thanks – Luccas Mar 26 '12 at 20:03
  • I know this is an old post. But I would like to ask @heyman about consumers. With you approach do you still have ActivityStreamUser (or similar) table to know what use will consume these streams? – Michael Simmons Aug 14 '12 at 22:03
  • 23
    I dont quite understand whats so great about this answer? How does creating a simple table translate to a weighted activity feed similar to facebook? All hes doing is storing all the activity . Which still leaves the question of how to turn a table of data into a dynamic weighted activity feed? – ChuckKelly Feb 27 '13 at 10:56
  • 2
    you use serialized object, what if the name of photo has changed. this way, you display the old name... – Pars Feb 23 '14 at 06:27
  • 5
    @ChuckKelly: If I recall correctly, back in 2008, when I wrote the answer, the Facebook feed wasn't weighted at all. It was just a chronological feed with all the activity from your friends. – heyman Apr 22 '14 at 07:36
  • Well any clues on how to go about doing such a weighted feed in the 21st century? – ChuckKelly Apr 24 '14 at 02:41
  • @DarkLeonhart broken link – OhadR Jan 11 '18 at 10:53
119

This is a very good presentation outlining how Etsy.com architected their activity streams. It's the best example I've found on the topic, though it's not rails specific.

http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture

Babak Naffas
  • 12,395
  • 3
  • 34
  • 49
Mark Kennedy
  • 1,639
  • 2
  • 13
  • 10
46

We've open sourced our approach: https://github.com/tschellenbach/Stream-Framework It's currently the largest open source library aimed at solving this problem.

The same team which built Stream Framework also offers a hosted API, which handles the complexity for you. Have a look at getstream.io There are clients available for Node, Python, Rails and PHP.

In addition have a look at this high scalability post were we explain some of the design decisions involved: http://highscalability.com/blog/2013/10/28/design-decisions-for-scaling-your-high-traffic-feeds.html

This tutorial will help you setup a system like Pinterest's feed using Redis. It's quite easy to get started with.

To learn more about feed design I highly recommend reading some of the articles which we based Feedly on:

Though Stream Framework is Python based it wouldn't be too hard to use from a Ruby app. You could simply run it as a service and stick a small http API in front of it. We are considering adding an API to access Feedly from other languages. At the moment you'll have to role your own though.

Ondrej Slinták
  • 31,386
  • 20
  • 94
  • 126
Thierry
  • 3,225
  • 1
  • 26
  • 26
20

The biggest issues with event streams are visibility and performance; you need to restrict the events displayed to be only the interesting ones for that particular user, and you need to keep the amount of time it takes to sort through and identify those events manageable. I've built a smallish social network; I found that at small scales, keeping an "events" table in a database works, but that it gets to be a performance problem under moderate load.

With a larger stream of messages and users, it's probably best to go with a messaging system, where events are sent as messages to individual profiles. This means that you can't easily subscribe to people's event streams and see previous events very easily, but you are simply rendering a small group of messages when you need to render the stream for a particular user.

I believe this was Twitter's original design flaw- I remember reading that they were hitting the database to pull in and filter their events. This had everything to do with architecture and nothing to do with Rails, which (unfortunately) gave birth to the "ruby doesn't scale" meme. I recently saw a presentation where the developer used Amazon's Simple Queue Service as their messaging backend for a twitter-like application that would have far higher scaling capabilities- it may be worth looking into SQS as part of your system, if your loads are high enough.

Tim Howland
  • 7,919
  • 4
  • 28
  • 46
  • Tim, do you by any chance remember the name of the presentation or the presentator? – Danita May 11 '09 at 12:45
  • it was at Oreilly and Associate's Ignite Boston presentation either number 3 or 4- I believe the presenter had a book on scaling RoR with Oreilly. Sorry I can't be more specific! – Tim Howland May 12 '09 at 01:07
  • Thanks Tim :) By the way, what did you mean with "smallish social network"? How many users, or active users at a certain time? – Danita May 26 '09 at 12:47
  • 3
    In case anyone needs it, I think this is the presentation Tim is talking about: "Dan Chak -- Scaling to the Size of your Problems" http://radar.oreilly.com/2008/09/ignite-boston-4----videos-uplo.html – Danita May 26 '09 at 13:20
  • Smallish in this case is such that "select * from events where event.is visible for this user" returns a result in less than a second or two- figure a few hundred thousand rows worth of events. – Tim Howland May 26 '09 at 15:14
13

If you are willing to use a separate software I suggest the Graphity server which exactly solves the problem for activity streams (building on top of neo4j graph data base).

The algorithms have been implemented as a standalone REST server so that you can host your own server to deliver activity streams: http://www.rene-pickhardt.de/graphity-server-for-social-activity-streams-released-gplv3/

In the paper and benchmark I showed that retrieving news streams depends only linear on the amount of items you want to retrieve without any redundancy you would get from denormalizing the data:

http://www.rene-pickhardt.de/graphity-an-efficient-graph-model-for-retrieving-the-top-k-news-feeds-for-users-in-social-networks/

On the above link you find screencasts and a benchmark of this approach (showing that graphity is able to retrieve more than 10k streams per second).

Rene Pickhardt
  • 692
  • 5
  • 17
11

I started to implement a system like this yesterday, here's where I've got to...

I created a StreamEvent class with the properties Id, ActorId, TypeId, Date, ObjectId and a hashtable of additional Details key/value pairs. This is represented in the database by a StreamEvent table (Id, ActorId, TypeId, Date, ObjectId) and a StreamEventDetails table (StreamEventId, DetailKey, DetailValue).

The ActorId, TypeId and ObjectId allow for a Subject-Verb-Object event to be captured (and later queried). Each action may result in several StreamEvent instances being created.

I've then created a sub-class for of StreamEvent each type of event, e.g. LoginEvent, PictureCommentEvent. Each of these subclasses has more context specific properties such as PictureId, ThumbNail, CommenText, etc (whatever is required for the event) which are actually stored as key/value pairs in the hashtable/StreamEventDetail table.

When pulling these events back from the database I use a factory method (based on the TypeId) to create the correct StreamEvent class.

Each subclass of StreamEvent has a Render(context As StreamContext) method which outputs the event to screen based on the passed StreamContext class. The StreamContext class allows options to be set based on the context of the view. If you look at Facebook for example your news feed on the homepage lists the fullnames (and links to their profile) of everyone involved in each action, whereas looking a friend's feed you only see their first name (but the full names of other actors).

I haven't implemented a aggregate feed (Facebook home) yet but I imagine I'll create a AggregateFeed table which has the fields UserId, StreamEventId which is populated based on some kind of 'Hmmm, you might find this interesting' algorithm.

Any comments would be massively appreciated.

jammus
  • 2,540
  • 23
  • 28
  • I am working on a system like this am very interested in any knowledge on it, did you ever finish yours? – JasonDavis Aug 08 '09 at 14:06
  • Great answer! Excellent separation of concerns, clean and elegant! – Mosh Sep 19 '12 at 00:30
  • This is a good start! It's very similar to how I began implementing my first stream. Once you get to the aggregate feed, however, things start to get complicated fast. You're right that you need a robust algorithm. My search led me to Rene Pickhardt's algorithm (he talks about it in his answer here), which I then implemented into my own service, which is now commercial (see http://www.collabinate.com and my answer on this question for more). – Mafuba Jul 20 '13 at 01:47
10
// one entry per actual event
events {
  id, timestamp, type, data
}

// one entry per event, per feed containing that event
events_feeds {
  event_id, feed_id
}

When the event is created, decide which feeds it appears in and add those to events_feeds. To get a feed, select from events_feeds, join in events, order by timestamp. Filtering and aggregation can then be done on the results of that query. With this model, you can change the event properties after creation with no extra work.

iuri
  • 37
  • 6
jedediah
  • 1,179
  • 9
  • 20
  • 1
    Suppose someone else is added as a friend after the event is added, that needs to see this event in their feed? then this wouldn't work – Joshua Kissoon May 30 '12 at 16:43
9

If you do decide that you're going to implement in Rails, perhaps you will find the following plugin useful:

ActivityStreams: http://github.com/face/activity_streams/tree/master

If nothing else, you'll get to look at an implementation, both in terms of the data model, as well as the API provided for pushing and pulling activities.

Alderete
  • 452
  • 4
  • 5
6

I had a similar approach to that of heyman - a denormalized table containing all of the data that would be displayed in a given activity stream. It works fine for a small site with limited activity.

As mentioned above, it is likely to face scalability issues as the site grows. Personally, I am not worried about the scaling issues right now. I'll worry about that at a later time.

Facebook has obviously done a great job of scaling so I would recommend that you read their engineering blog, as it has a ton of great content -> http://www.facebook.com/notes.php?id=9445547199

I have been looking into better solutions than the denormalized table I mentioned above. Another way I have found of accomplishing this is to condense all the content that would be in a given activity stream into a single row. It could be stored in XML, JSON, or some serialized format that could be read by your application. The update process would be simple too. Upon activity, place the new activity into a queue (perhaps using Amazon SQS or something else) and then continually poll the queue for the next item. Grab that item, parse it, and place its contents in the appropriate feed object stored in the database.

The good thing about this method is that you only need to read a single database table whenever that particular feed is requested, rather than grabbing a series of tables. Also, it allows you to maintain a finite list of activities as you may pop off the oldest activity item whenever you update the list.

Hope this helps! :)

  • Exactly my thoughts, I just needed a validation of my thoughts which I probably got now, cheers! – Sohail Nov 07 '16 at 01:53
5

There are two railscasts about such an activity stream:

Those solutions dont include all your requirements, but it should give you some ideas.

Benjamin Crouzier
  • 40,265
  • 44
  • 171
  • 236
4

I think Plurk's approach is interesting: they supply your entire timeline in a format that looks a lot like Google Finance's stock charts.

It may be worth looking at Ning to see how a social networking network works. The developer pages look especially helpful.

warren
  • 32,620
  • 21
  • 85
  • 124
2

After implementing activity streams to enable social feeds, microblogging, and collaboration features in several applications, I realized that the base functionality is quite common and could be turned into an external service that you utilize via an API. If you are building the stream into a production application and do not have unique or deeply complex needs, utilizing a proven service may be the best way to go. I would definitely recommend this for production applications over rolling your own simple solution on top of a relational database.

My company Collabinate (http://www.collabinate.com) grew out of this realization, and we have implemented a scalable, high performance activity stream engine on top of a graph database to achieve it. We actually utilized a variant of the Graphity algorithm (adapted from the early work of @RenePickhardt who also provided an answer here) to build the engine.

If you want to host the engine yourself or require specialized functionality, the core code is actually open source for non-commercial purposes, so you're welcome to take a look.

Mafuba
  • 603
  • 6
  • 19
2

I solved this a few months ago, but I think my implementation is too basic.
I created the following models:

HISTORY_TYPE

ID           - The id of the history type
NAME         - The name (type of the history)
DESCRIPTION  - A description

HISTORY_MESSAGES

ID
HISTORY_TYPE - A message of history belongs to a history type
MESSAGE      - The message to print, I put variables to be replaced by the actual values

HISTORY_ACTIVITY

ID
MESSAGE_ID    - The message ID to use
VALUES        - The data to use

Example

MESSAGE_ID_1 => "User %{user} created a new entry"
ACTIVITY_ID_1 => MESSAGE_ID = 1, VALUES = {user: "Rodrigo"}
LPL
  • 16,827
  • 6
  • 51
  • 95
Rodrigo
  • 21
  • 1