26

Trello shows a historial log of everything that any user has done since the board's inception. Likewise, if you click on a specific card it shows the history of anything anyone has done related to that card.

Keeping track of every change/addition/deletion that is kept indefinitely must collect a ton of data and also potentially bottleneck on writing to the history trail log (assuming it is written immediately to a data store of sorts). I mean, it isn't like they are storing everything in log files spread across 1000's of servers that they only collect and parse when they need to find something -- they are displaying all of this info all the time.

I know this isn't the only service that provides something like this, but how would you go about architecting such a system?

Oxed Frederik
  • 1,331
  • 1
  • 12
  • 12
  • You'd be surprised how good your RDBMS really is. The logs aren't stored in a file - they are stored in a database with some nice indexes. – JonH May 08 '12 at 19:54

3 Answers3

34

I'm on the Trello team. We use an Actions collection in our MongoDB instance, with a compound index on the ids of the models to which it refers (a Card is a model, and so is a Member) and the date when the action was performed. No fancy caching or anything, except inasmuch as the index and recently used documents are kept in memory by the DB. Actions is by far our biggest collection.

It is worth mentioning that most of the data needed to display an action is stored denormalized in the action document, so that speeds things up considerably.

Brett
  • 3,478
  • 1
  • 22
  • 23
  • So, you store the actions with a timestamp and index on both so that you can do a quick lookup, so simple! What is the "action document"? – Ape-inago Sep 12 '12 at 06:39
  • We use MongoDB, so the 'action document' is the equivalent of a 'row in the actions table' in a traditional relation DB, but it holds an arbitarary JSON document rather than highly-structured data. – Brett Sep 13 '12 at 17:09
  • @Brett, Are writes affected (slower) because your data is all denormalized? – Pacerier May 22 '14 at 22:12
3

The easiest way that comes to mind is to have a table like:

create table HistoryItems (
ID INT PK,
UserID INT PK,
DateTime datetime,
Data varbinary(max)/varchar(max)/...)

Indexing this on UserID allows for fast retrieval. A covering index would enable fetching the history of an entire user in one disk seek no matter how long it is.

This table could be clustered on (UserID asc, DateTime desc, ID) so you don't even have to have any index at all and still have optimal performance.

Any easy problem for a relational database.

usr
  • 168,620
  • 35
  • 240
  • 369
  • perhaps reads aren't that bad.. but wouldn't writing all of that data to one table have pretty bad locking issues? – Oxed Frederik May 08 '12 at 19:59
  • Usually no. Small amounts of writes per transaction (which is the case here) only lock rows. Inserts can happen concurrently that way. – usr May 08 '12 at 20:01
1

I have something very similar as @Brett from Trello answered above in my PHP + MySQL app which I use for tracking user activity in our order and production management app for our online web store.

I have table activities which holds:

  • user_id: user that performed action
  • action_id: the action that was performed (e.g. create, update, delete, and so on...)
  • resource: the ENUM list of resources (models) that action was performed on (e.g. orders, invoices, products, etc...)
  • resource_id: PK of the resource that action was performed on
  • description: text description of the action (can be null)

It's a large table indeed, but with right indexes it handles very well. It acts it's purpose. Is simple and fast. Currently it holds 200k records and growing with cca. 1000 new entries per day.

Primoz Rome
  • 10,379
  • 17
  • 76
  • 108