7

My question is akin to Data structure for partial multi-keys mapping?.

I have key-value pairs where the key is composed of three components (strings).

I am looking for a data structure the can efficiently perform search queries over keys, where the queries could be complete or partial specification the key (one or more components omitted). For example:

(x, y, z)
(x, *, *)
(*, y, *)
etc.

The unspecified part of the key can be either at the front, the middle or the end of the key.

My current implementation (a hash map which maps all possible partial keys to a set of values that match this partial key) is horribly slow when inserting, deleting and updating values.

Community
  • 1
  • 1
Werner de Groot
  • 933
  • 7
  • 15
  • Trie comes to mind when using partial keys... – Jaa-c Jul 14 '14 at 13:12
  • A trie seems to be limited to prefixes. This way, the first 'component' always needs to be specified. I also considered a suffix-tree, which seems to lift this restriction but it seems that this is the same as my current solution but (instead of using a hash) using a binary tree. Is this conclusion wrong? – Werner de Groot Jul 14 '14 at 13:36
  • Since you have a limited number of keys, why not just use three separate maps - one for each key? This will require slightly more space, but won't significantly impact time performance. – blgt Jul 14 '14 at 13:41
  • @blgt: 3 isn't enough. – Nuclearman Jul 14 '14 at 13:43
  • @Nuclearman Not enough? Please elaborate. I reckon a map per key is plenty. A trie would certainly work, but it would be overkill - as tries are designed for an unspecified number of "keys" – blgt Jul 14 '14 at 13:57
  • 1
    I was thinking more along the lines of a directed graph where if the triple is `(a,b,c)` then the edges are `(a,b), (b,c)`. This ends up being fairly similar to a trie over the dimensions rather than over each digit. So that `(50,20,30)` and `(50,20,50)` become basically `data[50][20] = {30,50}. You can certainly use only three (this is where the hasty not enough came from) but then you have to deal with set intersections which result in queries that aren't O(k), where k is the number of returned triples. Though that might be required if goal is to only increase speed by something like 5-10x. – Nuclearman Jul 14 '14 at 14:40

1 Answers1

3

My current implementation (a hash map which maps all possible partial keys to a list of values that match this partial key) is horribly slow when inserting, deleting and updating values.

If you are actually using a list then that is almost certainly the issue as they are slow on deleting/updating/inserting may also be an issue depending on how you are doing it. You are better off using a set (unordered) or balanced binary search tree (ordered).

Nuclearman
  • 5,029
  • 1
  • 19
  • 35
  • Is having 2^3 = 8 maps with (x, y, z), (x, y, *), (x, *, z), (x, *, *), (*, y, z), (*, y, *), (*, *, z), (*, *, *) more efficient than one map with 8 entries for each key (like I have now)? – Werner de Groot Jul 14 '14 at 13:55
  • I'd have to see an example, but probably not now that I think about it. It seems like the actual issue that I missed is probably the data structure you are using for storing the triples. Lists are indeed rather slow at doing those things so I changed my answer based on that being more likely the issue. You also might want to include a language tag. – Nuclearman Jul 14 '14 at 15:00
  • I am indeed using a set. Mentioning a list was a mistake. The process is slow in absolute terms, but you are right in the sense that 8 O(1) inserts are still O(1). Maybe this is as fast as it gets... Therefore I marked your answer as accepted. – Werner de Groot Jul 15 '14 at 11:21
  • @WernerdeGroot: If you are already using sets then it may be the case that this is a case of the [XY-Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem), and there are a few additions you can make to your question. If the massive number of triples is what's slowing you down, then where are you getting the triples from? There might be a way to efficiently index the triples without actually iterating over every triple depending on how the list of triples is generated. Also how many triples do you have and how long does it take to add them to the data structure? – Nuclearman Jul 15 '14 at 15:06