Which data structure should i use for my purpose?

Question

I need a data structure just like a map but each key may have multiple values related to it but I need to get all the values corresponding to a single key as an array of the objects. So which data structure would be the best to do this. I don't need to search in the data structure, i just need fast access to all values corresponding to a particular key. I have looked up at std::multimap but it doesn't return all the values for a particular key. So which might the best data structure in C++ which I may use?

Just a side note, multimap can returns values by key, not in array, but can. http://www.cplusplus.com/reference/map/multimap/equal_range/ — ForEveR, Jun 07 '13 at 06:09
@ForEveR, yeah I knew that, but wanted to know that whether there is something that returns all the values as an array — ab_11, Jun 07 '13 at 06:26
There is one question left: do you need anything specific regarding the values associated to a given map ? For example, you might need unicity, some ordering, ... — Matthieu M., Jun 07 '13 at 06:26
It'll be better if I explain you the whole thing. I have a 3D mesh, now I have many cells in it of different shapes formed by a group of points. Now I need to know the indices of all the cells that have particular point. So my 'key' in the map would be the index of the point and my 'value' should be the indices of all the cells that share that point. Now I don't want to iterate on the std::map again and again because I have about 80 million such points for whom I need to populate the map. I hope you understood what I am trying to convey — ab_11, Jun 07 '13 at 06:42

Tony Delroy · Answer 1 · 2013-06-07T09:16:04.400

6

I need a data structure just like a map but...

std::map<key, std::vector<value>>

80 million points is a good whack - worth considering other options. Ones worth a little thought/experimentation/benchmarking include:

sparse direct indexing... to achieve that, you need enough memory for not just the 80 million data points, but the entire x/y/z space that they span, but can then do an [x][y][z] lookup to find the vector of cell ids - that's obviously going to be huge - whether it's doable or desirable is not clear from your problem description
a sorted vector... depending on the order/overlap of your data structure element insertion and lookups, and whether you can afford a std::map to std::vector compaction step - you could sort a std::vector of (x,y,z) values then have binary_search outperform std::map due to the contiguous memory usage of the vector
std::unordered_map<key, std::vector<value>>... presizing for say 100 million bucket capacity should speed insertion a bit. This could be slower or faster than other options... there's likely less pages of memory for the index than for sparse indexing, but more than a binary_search on contiguous memory, the least # memory pages visited per lookup, but with normal hash techiniques you'll be hitting effectively random (but repeatable) hash buckets even if the x,y,z coordinates only differ a little so cache hits could be worse than all other options above.

Actual benchmarks are always the best way to tune, preferably with a profile to confirm costs are for expected reasons.

edited Jun 07 '13 at 09:16

answered Jun 07 '13 at 06:07

Tony Delroy

102,968
15
177
252

thanks but how to insert a key-value pair in the map then? I will have to insert values for the same key multiple times and i don't intend to populate the vector first – ab_11 Jun 07 '13 at 06:21
1

@user2401047: Locate the vector with the given key with either `find` or `operator[]` and `push_back` into it. – David Rodríguez - dribeas Jun 07 '13 at 06:26
2

@user2401047: `mymap[key].push_back(value)` (assuming you do not need to check for unicity or order the values). – Matthieu M. Jun 07 '13 at 06:27
@Tony D, you haven't heard the full whack yet, I also need to find out the all nearest neighbors withing a radius r from this dataset of 80 million points to all the points in a datset of 50 million points, and I have no idea how to use Kd trees or octrees for this. I need good tutorials for them. This is the main thing that I have to do, the problem that I explained earlier(to which this question refers to) is secondary. – ab_11 Jun 07 '13 at 12:29
@user2401047: getting tricky there - what's optimal or close to will depend heavily on how sparse your data set is, the distribution of radius values you have to deal with, whether you have ample RAM to spare etc.. I've never heard of kd trees or octrees so can't recommend any tutorials. Good luck! – Tony Delroy Jun 10 '13 at 01:17
okay, thanks anyway i'll post a different question for that. – ab_11 Jun 10 '13 at 05:21

TemplateRex · Answer 2 · 2013-06-07T09:57:08.023

The answer by @TonyD is certainly fine, but there are some tradeoffs compared to

std::multimap<key, value>

Searching for all values for a given key should give you the same O(log N) complexity

auto result = my_multimap.equal_range(my_key);

Iteration is still of O(N) complexity:

for (auto it = result.first; it != result.second; ++it)
     // bla

However, in all real world std::multimap implementations, the above iteration is doing node-based pointer chasing over the "consecutive" value elements rather than the contiguous iteration that you get for a std::vector based std::map. This might matter for reasons of cache-locality.

The main drawback that I can see from the std::vector solution is that you are committing to keeping all values together which might impose some overhead, depending on how often you are copying your data around.

The multimap approach makes it easier to also insert/extract a single value from the container

my_multimap.insert(std::make_pair(some_key, another_value);

versus

auto it = my_map.find(some_key);
if (it != my_map.end()) 
    it->second.push_back(another_value);
else
    my_map.insert(std::make_pair(some_key, another_value));

You probably should benchmark your program to see which container is more convenient.

Which data structure should i use for my purpose?

2 Answers2