6

I'm working on a very high throughput site with many items, am looking into implementing "trending now" type functionality, that would allow users to quickly get a prioritized list of the top N items that have been viewed recently by many people, that gradually fade away as they get fewer views.

One idea about how to do this is to give more weight to recent views of an item, something like a weight of 16 for every view of an item the past 15 minutes, a weight of 8 for every view of an item in the past 1 hour, a weight of 4 for things in the past 4 hours, etc but I do not know if this is the right way to approach it.

I'd like to do this in Redis, we've had good success with Redis in the past for other projects.

What is the best way to do this, both technologically and the determination of what is trending?

The first answer hints at a solution but I'm looking for more detail -- starting a bounty.

These are both decent ideas, but not quite detailed enough. One got half the bounty but leaving the question open.

OneSolitaryNoob
  • 5,423
  • 3
  • 25
  • 43

4 Answers4

9

So, I would start with a basic time ordering (zset of item_id scored by timestamp, for example), and then float things up based on interactions. So you might decided that a single interaction is worth 10 minutes of 'freshness', so each interaction adds that much time to the score of the relevant item. If all interactions are valued equally, you can do this with one zset and just increment the scores as interactions occur.

If you want to have some kind of back-off, say, scoring by the square root of the interaction count instead of the interaction count directly, you could build a second zset with your score for interactions, and use zunionstore to combine this with your timestamp index. For this, you'll probably want to pull out the existing score, do some math on it and put a new score over it (zadd will let you overwrite a score)

The zunionstore is potentially expensive, and for sufficiently large sets even the zadd/zincrby gets expensive. To this end, you might want to keep only the N highest scoring items, for N=10,000 say, depending on your application needs.

Mark Tozzi
  • 10,353
  • 5
  • 22
  • 30
3

These two links are very helpful:

http://stdout.heyzap.com/2013/04/08/surfacing-interesting-content/

http://word.bitly.com/post/41284219720/forget-table

  • 1
    Posting only links is usually risky; you're left hanging if the links ever become invalid. – Dennis Meng Sep 05 '13 at 22:01
  • Thanks Bill. I looked into forget-table. It looks like it doesn't scale far. The example is a sorted set of ~260 countries that is read out in its entirety. With just 260 elements that's fine, but would not work with 10s of millions. There might be a way to partition their data into many Redis keys and use their decay approach, but, makes it tedious to find the top "trending now" items. (and Bill is correct, the links may go stale eventually) – OneSolitaryNoob Sep 10 '13 at 00:21
  • 2
    Looks like the first link is dead. Is this the same article http://qwerjk.com/posts/surfacing-interesting-content? – smilly92 Mar 08 '15 at 00:41
3

The Reddit Ranking algorithm does a pretty good job of what you describe. A good write up here that talks through how it works.

https://medium.com/hacking-and-gonzo/how-reddit-ranking-algorithms-work-ef111e33d0d9

  • This link seems to point to an offline server. This might help (https://medium.com/hacking-and-gonzo/how-reddit-ranking-algorithms-work-ef111e33d0d9) – Zobayer Hasan Jul 11 '17 at 06:12
2

consider an ordered set with the number of views as the scores. whenever an item is accessed, increment its score (http://redis.io/commands/zincrby). this way you can get top items out of the set ordered by scores.

you will need to "fade" the items too, maybe with an external process that would decrement the scores.

akonsu
  • 28,824
  • 33
  • 119
  • 194
  • thanks, using this approach, how would the items ever become less "hot"? – OneSolitaryNoob Jun 11 '13 at 05:35
  • thanks a lot, ZINCRBY looks really useful. any more details about the structure? what is the most i could expect to put in this set? would a few million be too many? is the "fade" cron job approach a sensible way to do things? – OneSolitaryNoob Jun 12 '13 at 02:42
  • I suppose you can put as many items into your set as redis can fit in to memory. it depends on your server size. to me a cron job, or a background process seems sensible, yes. I would also experiment with concatenating hit counts with timestamps and putting that into the set's scores, instead of fading the items from a background process. – akonsu Jun 12 '13 at 13:39