5

I am looking for data structure in c++ and I need an advice.

I have nodes, every node has unique_id and group_id:

1 1.1.1.1
2 1.1.1.2
3 1.1.1.3

4 1.1.2.1
5 1.1.2.2
6 1.1.2.3

7 2.1.1.1
8 2.1.1.2

I need a data structure to answer those questions:

  1. what is the group_id of node 4
  2. give me list (probably vector) of unique_id's that belong to group 1.1.1
  3. give me list (probably vector) of unique_id's that belong to group 1.1
  4. give me list (probably vector) of unique_id's that belong to group 1

Is there a data structure that can answer those questions (what is the complexity time of inserting and answering)? or should I implement it?

I would appreciate an example.

EDIT:

at the beginning, I need to build this data structure. most of the action is reading by group id. insertion will happen but less then reading.

the time complexity is more important than memory space

user1673206
  • 1,671
  • 1
  • 23
  • 43
  • How many objects are we talking here, this is somehow tagged database which implies the structure won't fit in memory? – EdChum Apr 30 '15 at 09:08
  • hmm no more than 500 nodes. it should fit in memory – user1673206 Apr 30 '15 at 09:12
  • related: http://stackoverflow.com/questions/13721522/which-stl-container-should-i-use-c really you should use vector first unless you find it slow which means profiling. Then I'd suggest a map – EdChum Apr 30 '15 at 09:16

5 Answers5

5

To me, hierarchical data like the group ID calls for a tree structure. (I assume that for 500 elements this is not really necessary, but it seems natural and scales well.)

Each element in the first two levels of the tree would just hold vectors (if they come ordered) or maps (if they come un-ordered) of sub-IDs.

The third level in the tree hierarchy would hold pointers to leaves, again in a vector or map, which contain the fourth group ID part and the unique ID.

Questions 2-4 are easily and quickly answered by navigating the tree.

For question 1 one needs an additional map from unique IDs to leaves in the tree; each element inserted into the tree also has a pointer to it inserted into the map.

Peter - Reinstate Monica
  • 15,048
  • 4
  • 37
  • 62
3

First of all, if you are going to have only a small number of nodes then it would probably make sense not to mess with advanced data structuring. Simple linear search could be sufficient.

Next, it looks like a good job for SQL. So may be it's a good idea to incorporate into your app SQLite library. But even if you really want to do it without SQL it's still a good hint: what you need are two index trees to support quick searching through your array. The complexity (if using balanced trees) will be logarithmic for all operations.

Matt
  • 13,674
  • 1
  • 18
  • 27
  • Hi, thank you for your answer. How the trees will answer the question 2,3,4? – user1673206 Apr 30 '15 at 09:47
  • @userd Just like in SQL `LIKE ('1%')` works. Unlike searching for simple numbers here every other digit reduces search into a lesser subtree. To find a data list you need to traverse through a _subtree_ starting from the leftmost leaf and ending in the rightmost one (usually for speed DBMS uses additional linked list, that is every tree node has also next/prev pointers). – Matt Apr 30 '15 at 10:32
  • @user4419802 - just be aware that LIKE (...) is pretty costly and it will most likely screw all your index usages... DB is ok, but take a look at the CONNECT level statement. – Mario The Spoon May 01 '15 at 12:51
  • @MarioTheSpoon That depends upon LIKE's expression. `LIKE('abc%')` _allows_ index usage while `LIKE('%xyz')` not. http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning – Matt May 02 '15 at 05:15
2

Depends...

How often do you insert? Or do you mostly read? How often do you access by Id or GroupId?

With a max of 500 nodes I would put them in a simple Vector where the Id is the offset into the array (if the Ids are indeed as shown). The group-search can than be implemented by iterating over the array and comparing the partial gtroup-ids.

If this is too expensive and you really access the strcuture a lot and need very high performance, or you do a lot of inserts I would implement a tree with a HashMap for the Id's.

If the data is stored in a database you may use a SELECT/ CONNECT BY if your systems supports that and query the information directly from the DB.

Sorry for not providing a clear answer, but the solution depends on too many factors ;-)

Mario The Spoon
  • 4,799
  • 1
  • 24
  • 36
  • Then I would go with a simple Array/ Vector and helper functions or a Tree since the compares for the partial GroupId's could become pretty expensive. And a HashMap would not help for that problem. – Mario The Spoon Apr 30 '15 at 09:47
0

I am not sure of the perfect DS for this. But I would like to make use of a map. It will give you O(1) efficiency for question 1 and for insertion O(logn) and deletion. The issue comes for question 2,3,4 where your efficiency will be O(n) where n is the number of nodes.

Pratik
  • 1,122
  • 8
  • 26
  • Sorry where do you get `O(1)` for lookup, insertion and deletion for a map? – EdChum Apr 30 '15 at 09:16
  • Any hashmap will give you O(1) for lookup as you will be using the key to look up the value. When it comes to insertion/deletion, since you are not following any order of insertion and not shifting elements on deletion, the performance will be very close to O(1) – Pratik Apr 30 '15 at 09:20
  • Sorry, insertion will be logn – Pratik Apr 30 '15 at 09:22
0

Sounds like you need a container with two separate indexes on unique_id and group_id. Question 1 will be handled by the first index, Questions 2-4 will be handled by the second.

Maybe take a look at Boost Multi-index Containers Library

Component 10
  • 10,247
  • 7
  • 47
  • 64