1

My data are stored in a vector of custom objects and in this particular piece of software I'm interested in those that are interesting. I thus have a std::vector<size_t> interesting; where I store their indexes, so that when I call data[ interesting[i] ] I get the i-th interesting data object.

I need to specify here that interesting is an unsorted vector of indexes, so at the moment there isn't any clever trick I can use with the ordering of the elements in the original data ...

Now onto the problem: These data are interesting because they have either property A, B or both and I need to store the value of that property (let's say it's a simple double) in another vector for fast retrieval.

I wonder how would I go about it! Which structure should I use for the properties?

Say data.size() = 100 and interesting.size() = 90 (much bigger, but easier to work with round numbers) and deleting the "un-interesting" data is not an option because they're used by other parts of the software. I need in particular to be able to do fast lookups, neither data nor interesting will change in size as the software runs (only data values will) so insertion in the structures is a one-off cost at the beginning and won't impact performances as much as lookups will.

I have two solutions that came to mind:

  • I could use a std::unordered_map<size_t,double> A and similar for B to store the index-property pairs, so I can search using A.at(interesting[i]) to get the value of property A for the i-th interesting element. Because insertions won't happen at this stage anymore am I safe to assume this is O(1) ? This would be my preferred way.

  • I could allocate a std::vector<double> A ( data.size() ); to store these values and just call A[ interesting[i] ] as I do for the data; this ensures lookups are fast but I would:

    1. waste a LOT of memory as the size of data grows...in fact I'd waste a combined data.size() elements if we assume no overlap. For now A and B are just double, but if at any point I need to change their type into something larger ... I assume that a sparse vector with O(1) lookup doesn't exists, right?
    2. There's no bound checking here and god forbid I go and search for A[3] and 3 is not an interesting index! I'd need to insert my own checking overhead which will probably make so that lookups are not so fast anymore...

In short: I'm interested in a data structure for the properties A and B so that I can access those values [which are defined only for interesting data point] and know to which data value they correspond as well...

Can anybody think of a good solution for this problem? Hopefully I made myself clear enough!


A commenter asked for a code example, I don't have a working since I haven't figured out which data structure to use, but ideally I would simply want to do

std::vector<MyObject> data;
// ...
// get data and do stuff
// ...

std::vector<size_t> interesting;
// ...
// initialise interesting as a vector of indexes that refer to the data
// ...

// Now initialize the A property DS
std::vector<double> A ( data.size() ); // <--- this is what I'm questioning about
// maybe I'm better off with std::unordered_map<size_t,double> A ?

// so that (once initialised) I can quickly do
size_t i = 5;
std::cout << "Data point " << data[ interesting[i] ] << 
    " has a property A's value of " << A[ interesting[i] ] << std::endl;
// for evey i I want
FooBant
  • 394
  • 1
  • 14
  • 1
    May be you could post a code example to make your problem clear, I can't understand it. Could be you just need to learn about `boost::multi_index`. – Maxim Egorushkin Feb 28 '19 at 10:56
  • The data vecotr does not change after an init phase? then do not bother with indicies. define a `struct Interesting`, that holds a pointer to the data and the values you are interested and make `interesting` an `std::vector`. –  Feb 28 '19 at 11:08
  • @user9400869 that would sort-of work, but I need to easily access the whole vector of `A` and `B` separately from `interesting` at times, and vice-versa. Also, because I have multiple properties ( `A` and `B` ) the `struct` should hold both but that would waste memory as well since not every `interesting` has both! – FooBant Feb 28 '19 at 11:17
  • @MaximEgorushkin I didn't know about boost::multi_index and it's an interesting read but I'm afraid it's not what I'm searching for – FooBant Feb 28 '19 at 11:22
  • std::unordered_map lookup is indeed O(1) although much slower then vector lookup since it requires hashing. The vector lookup on the other hand requires more memory. So as usual you have a tradeof. You either need speed or low memory hit. But you can't have both. – freakish Feb 28 '19 at 11:44
  • Thanks @freakish, I'll test unordered_map, map and the vector implementation. I suspect vector will still be fastest but there's a limit on how much memory I am willing to waste :) – FooBant Feb 28 '19 at 17:02

0 Answers0