20

Please suggest a data structure for representing a list of records in memory. Each record is made up of:

  • User Name
  • Points
  • Rank (based on Points) - optional field - can be either stored in the record or can be computed dynamically

The data structure should support implementation of the following operations efficiently:

  1. Insert(record) - might change ranks of existing records
  2. Delete(record) - might change ranks of existing records
  3. GetRecord(name) - Probably a hash table will do.
  4. GetRecord(rank)
  5. Update(points) - might change ranks of existing records

My main problem is efficient implementation of GetRecord(rank), because ranks can change frequently.

I guess an in-memory DBMS would be a good solution, but please don't suggest it; please suggest a data structure.

Sachin Joseph
  • 18,928
  • 4
  • 42
  • 62

3 Answers3

17

Basically, you'll just want a pair of balanced search trees, which will allow O(lg n) insertion, deletion, and getRecord operations. The trick is that instead of storing the actual data in the trees, you'll store pointers to a set of record objects, where each record object will contain 5 fields:

  1. user name
  2. point value
  3. rank
  4. pointer back to the node in the name tree that references the object
  5. pointer back to the node in the point tree that references the object.

The name tree is only modified when new records are added and when records are deleted. The point tree is modified for insertions and deletions, but also for updates, where the appropriate record is found, has its point-tree pointer removed, its point count updated, then a new pointer added to the point-tree.

As you mention, you can use a hash table instead for the name tree if you like. The key here is that you simply maintain separate sorted indexes into a set of otherwise unordered records that themselves contain pointers to their index nodes.


The point tree will be some variation on an order statistic tree, which rather than being a specific data structure, is an umbrella term for a binary search tree whose operations are modified to maintain an invariant which makes the requested rank-related operations more efficient than walking the tree. The details of how the invariants are maintained depend on the underlying balanced search tree (red-black tree, avl tree, etc) used.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • Find the record using the name tree (O(lg n)); use the stored pointer in the object to find the corresponding node in the point tree; remove that node (O(lg n)); update the point total; insert a new node in the point tree (O(lg n)); and update the pointer in the object record. – chepner Aug 12 '14 at 04:01
  • But that doesn't care about updating the ranks of the record or other records, right? Rank is based on points. – Sachin Joseph Aug 12 '14 at 15:48
  • The rank of a node in the point tree is basically the number of nodes in the right subtree of that node (plus 1). You can maintain this count during the insert and delete operations on a balanced binary search tree. – chepner Aug 12 '14 at 16:33
  • So do you mean that the record with maximum points (rank=1) will be the only leaf node in the point-tree, and the record with the minimum points (rank=n) will be the root node with no left child? In effect, point-tree is a right skewed tree, right? Is that an efficient implementation? – Sachin Joseph Aug 13 '14 at 15:06
  • No, you need a balanced tree for an efficient implementation, but in *any* search tree it is true than an item is larger than every element in its left subtree and smaller than every time in its right subtree. Rank 1 would be the right-most node (following right children from the root) and rank n would be the left-most node. The root is (approximately) the median element. – chepner Aug 13 '14 at 15:46
  • So are you suggesting that ranks need not be stored as such, rather it can be computed dynamically? – Sachin Joseph Aug 13 '14 at 15:55
  • A mix. I'm not sure if you can compute it purely on the fly (you might be able to define an appropriate tree walk on a balanced tree that takes O(lg n), but I haven't given it a great deal of thought). You can, though, store the size of the right subtree in each node, which allows you to compute the rank of a node given the size of the tree and the value stored in that node. The right-subtree size can be updated during insertions and deletions, although the algorithm for doing so will depend on the algorithm use to maintain a balanced tree. – chepner Aug 13 '14 at 15:58
  • How do you compute the rank of a node given the size of the tree, size of the right subtree, and the value stored in that node? – Sachin Joseph Aug 13 '14 at 16:39
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/59287/discussion-between-sachin-joseph-and-chepner). – Sachin Joseph Aug 13 '14 at 16:42
3

A skiplist + hashmap should work.

Here is an implementation in Go: https://github.com/wangjia184/sortedset

Every node in the set is associated with these properties.

  • key is an unique identity of the node, which is "User name" in your case.
  • value is any value associated with the node
  • score a number decides the order(rank) in the set, which is "Points" in your case

Each node in the set is associated with a key. While keys are unique, scores may be repeated. Nodes are taken in order (from low score to high score) instead of ordered afterwards. If scores are the same, the node is ordered by its key in lexicographic order. Each node in the set also can be accessed by rank, which represents the position in the sorted set.

A typical use case of sorted set is a leader board in a massive online game, where every time a new score is submitted you update it using AddOrUpdate() method. You can easily take the top users using GetByRankRange() method, you can also, given an user name, return its rank in the listing using FindRank() method. Using FindRank() and GetByRankRange() together you can show users with a score similar to a given user. All very quickly.

Mr.Wang from Next Door
  • 13,670
  • 12
  • 64
  • 97
1

Look for a DBMS that includes a function to select a record by sequential record number.

See: How to select the nth row in a SQL database table?

Construct a table with a UserName column and a Points column. Make UserName the primary index. Construct a secondary non-unique maintained index on Points.

To get the record with rank R, select the index on Points and move to record R.

This makes the DBMS engine do most of the work and keeps your part simple.

Community
  • 1
  • 1
A. I. Breveleri
  • 747
  • 5
  • 6