2

I am looking for an optimal data structure for these pseudo-code operations:

add/update(key, node-content);
node=walk/iterate();
delete(node);

I would do a hash-table, but it is not possible to walk/iterate through it in efficient way (all buckets need be examined). Currently, I consider rbtree, but my doubts revolve around the fact that I need re-balance it at each add/delete operation keeping a global tree mutex presumably... Could anyone share some expertise what are the best options may be?

UPDATE I would like to expand on usage of the sought data structure as it would clarify the questions being asked so far.

The structure will have a few hundred nodes at most - so not that large. The most frequent operation will be walk(), wherein each and every node is read by turn, the order does not matter. walk() can happen thousand times a second. So far linked-list or even array would do.

The second most frequent operation is update(node-key, node-content). This is where efficient search needed. The operation is likely to occur many hundreds times a second. Thus, hash table is appealing.

Sometimes, a node will be added - when the update() doesn't find an existing node - or deleted. Add()/delete() happens rarely, say, once a day - so the cost of these operations is irrelevant.

UPDATE2 I think here is a very good recap on structures in question. Currently I gravitate towards skiplist

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
wick
  • 1,995
  • 2
  • 20
  • 31
  • Do you have any runtime/space limits? You could go for hash table and maintain in addition a list of all the keys in the hash table. Then, you could iterate over the hash table using iteration over the list elements and `get(key)` for each element. – SHG Jun 18 '17 at 20:56
  • Hash table could absolutely be iterated through efficiently. Every hash table library implementation I know of does it. RB tree would work fine too. – Bernhard Barker Jun 18 '17 at 22:45
  • Can you elaborate a bit on your operations? How does `delete(node)` work? Shouldn't that be `delete(key)`? If not, how are you getting the node you want to delete? Do you want to actually be able to query the data as well (e.g. lookup by key), or do you just want to iterate through? If you want to query, in what way? – Bernhard Barker Jun 18 '17 at 22:51
  • A [skip list](https://en.wikipedia.org/wiki/Skip_list) gives you O(log n) for add, update, and delete. Iteration is O(n) to iterate the entire list (O(1) to get the first item, and O(1) for each subsequent item). – Jim Mischel Jun 19 '17 at 14:29
  • @SHG, i coined unidirectional linked-list ring with multiple entry points obtained from a small hash table that allows collisions. So O(1) update would happen where there is no collision, or O(n) worst case where there is. I just think it's a bit over-complicated for implementation and standard structure would be more reliable. – wick Jun 19 '17 at 15:32
  • @Dukeling I look to implement myself, and I do not currently see how would a hash table be easily iterated - hence the question. I updated the question to show usage. delete(node) as one will be identified during walk(). – wick Jun 19 '17 at 15:35
  • Both hash table and a BST would be some work to efficiently implement yourself. A hash table arguably has quite a few more little details that needs to be optimised. The best option for either case, if you really can't use a library, is probably to just copy some open-source code (or just to look at how it's done there). If the hash table is well-sized with a decent hash function, iterating through some empty rows in the table shouldn't make the code much slower. Although needing to iterate through a large (?) structure a thousand times a second points to design problems. – Bernhard Barker Jun 19 '17 at 16:12
  • If you really want to optimise the efficiency of that iterate, you could have an array or linked-list **in addition to** your hash table or BST, which you update concurrently (each element in your map can keep an index or pointer to the corresponding element in the array or LL), but I suspect reviewing your design might show an improvement that is orders of magnitude greater than focusing on this. – Bernhard Barker Jun 19 '17 at 16:18
  • @wick A walk()/iterate() of a container would normally examine all nodes (or until you decide to stop iterating..), and it's easy to implement on a hash table. It seems you have some other definition or restrictions since you have dismissed a hash table - can you elaborate on that ? – nos Jun 20 '17 at 08:20
  • @nos, I am just not aware how a hash table could be efficiently iterated through. Consider 1024 buckets 40 of which is occupied. To iterate through such structure one would have to visit 1024 buckets as opposed to efficient 40 visits. Do I miss anything? – wick Jun 20 '17 at 14:45
  • Thanks @JimMischel I have implemented my flavor of skiplist https://github.com/psvz/snakelist – wick Jun 25 '17 at 14:30

1 Answers1

0

For starters, a hash table might actually not be all that bad here! A typical hash table maintains a load factor (ratio of number of items to number of slots) between 0.7 and 5, depending on the implementation. This means that you won’t be spending all that much time iterating over empty slots or empty buckets. So unless you have performance data showing that this is indeed not fast enough, I’d lean toward that as an initial idea.

But if you can’t do that, then you might want to consider using a hash table in which you have a doubly-linked list threaded through the items in the table. This list can easily be updated in time O(1) whenever an item is added or removed from the table. If you want to loop over all the items, just walk the linked list. And if you need to access individual elements by key, you can just look them up in the hash table.

This will be more performant than the skiplist idea you’re proposing, since the cost of lookups in the hash table are (expected) O(1) compared with a skiplist’s O(log n).

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065