My data
are stored in a vector of custom objects and in this particular piece of software I'm interested in those that are interesting
. I thus have a std::vector<size_t> interesting;
where I store their indexes, so that when I call data[ interesting[i] ]
I get the i-th interesting data object.
I need to specify here that interesting
is an unsorted vector of indexes, so at the moment there isn't any clever trick I can use with the ordering of the elements in the original data
...
Now onto the problem:
These data are interesting because they have either property A
, B
or both and I need to store the value of that property (let's say it's a simple double
) in another vector for fast retrieval.
I wonder how would I go about it! Which structure should I use for the properties?
Say data.size() = 100
and interesting.size() = 90
(much bigger, but easier to work with round numbers) and deleting the "un-interesting" data is not an option because they're used by other parts of the software.
I need in particular to be able to do fast lookups, neither data
nor interesting
will change in size as the software runs (only data values will) so insertion in the structures is a one-off cost at the beginning and won't impact performances as much as lookups will.
I have two solutions that came to mind:
I could use a
std::unordered_map<size_t,double> A
and similar for B to store the index-property pairs, so I can search usingA.at(interesting[i])
to get the value of propertyA
for the i-th interesting element. Because insertions won't happen at this stage anymore am I safe to assume this is O(1) ? This would be my preferred way.I could allocate a
std::vector<double> A ( data.size() );
to store these values and just callA[ interesting[i] ]
as I do for the data; this ensures lookups are fast but I would:- waste a LOT of memory as the size of
data
grows...in fact I'd waste a combineddata.size()
elements if we assume no overlap. For nowA
andB
are justdouble
, but if at any point I need to change their type into something larger ... I assume that a sparse vector with O(1) lookup doesn't exists, right? - There's no bound checking here and god forbid I go and search for
A[3]
and3
is not an interesting index! I'd need to insert my own checking overhead which will probably make so that lookups are not so fast anymore...
- waste a LOT of memory as the size of
In short: I'm interested in a data structure for the properties A
and B
so that I can access those values [which are defined only for interesting
data point] and know to which data
value they correspond as well...
Can anybody think of a good solution for this problem? Hopefully I made myself clear enough!
A commenter asked for a code example, I don't have a working since I haven't figured out which data structure to use, but ideally I would simply want to do
std::vector<MyObject> data;
// ...
// get data and do stuff
// ...
std::vector<size_t> interesting;
// ...
// initialise interesting as a vector of indexes that refer to the data
// ...
// Now initialize the A property DS
std::vector<double> A ( data.size() ); // <--- this is what I'm questioning about
// maybe I'm better off with std::unordered_map<size_t,double> A ?
// so that (once initialised) I can quickly do
size_t i = 5;
std::cout << "Data point " << data[ interesting[i] ] <<
" has a property A's value of " << A[ interesting[i] ] << std::endl;
// for evey i I want