1

Suppose we have 1000 items and a place to show any ten of these items at a time, to the visiting user. We can capture click rate and items which are shown together.

  1. How can we optimally get the most popular items (say 10) out of these?
  2. How can we continually update popularity and show the optimal 10 items?

Edit: I'm looking for the different approaches instead of implementations.

Appy
  • 11
  • 3
  • Maintain a `max heap`. It should work. – Haris Sep 09 '15 at 13:09
  • Can you elaborate? Sorry for the ignorance. – Appy Sep 09 '15 at 13:15
  • Which operation happens more frequently? The reorder or the view-top-10? If the view operation dominates you want to maintain items in a sorted order (e.g. BST). If the reorder dominates, you may be willing to incur some more overhead on the top-10 operation in exchange for a cheaper reorder operation (look into different max-heap implementations). – Peter Dixon-Moses Sep 10 '15 at 14:30

2 Answers2

1

If you really want to squeeze this down, there is a dumb/simple approach for your case (show top 1%).

This optimization can happen because on average, only 1 out of 100 popularity changes will knock out one of the top 1%. (Assumes a random distribution of updates. Of course with a more typical power-law distribution, this could happen much more frequently.)

  1. Sort the entire initial collection,

    • Store only the top 10 in any sorted data structure (e.g. BST)
    • Store the popularity score of #10 (e.g. minVisiblePopularity)
  2. then with each subsequent popularity change in the collection, compare with minVisiblePopularity.

    • If the new popularity falls above the minVisiblePopularity, update the top-10 structure and minVisiblePopularity accordingly.
    • (Or if the old popularity was greater, but new popularity is less - e.g. former top 10 item getting knocked out).

This adds a minimal storage requirement of an extremely small binary search tree (10 items) and a primitive variable. The tree then only requires updating when a popularity change knocks out one of the previous top-10.

Peter Dixon-Moses
  • 3,169
  • 14
  • 18
  • Just saw an earlier answer along these lines for [What is the best data structure for keeping the top n elements in sort order?](http://stackoverflow.com/questions/14969909/what-is-the-best-data-structure-for-keeping-the-top-n-elements-in-sort-order#14986415) (with discussion) – Peter Dixon-Moses Sep 10 '15 at 15:44
0

Self implemented:

To maintain ordered array by popularity and a hash table that contain reference to corresponding item in the popularity binary tree. So, last 10 would the most popular items, access to them will be O(M) where M is count of items to show.

To maintain ordered array:

It can be maintained using self-balancing binary tree with log(N) complexity where N is total count of elements

http://www.sitepoint.com/data-structures-2/

As a practical option:

database can be used to store items and B-tree index can be added to popularity column; DBMS will have required optimizations here https://en.wikipedia.org/wiki/Database_index

Sparrow
  • 88
  • 7