2

I want to know which data structure will be memory efficient in my case.Please guide me. Following are the requirements. As shown in the fig below, on the basis of three values A, B and C ( where A will be an integer value and B , C will be characters) I want to store two values accepting rule number and true/false.Thus, for each unique value of A,B and C I want to store two values (accepting rule number and true/false) corresponding to them.Please guide me which data structure will be fast in searching and will also be memory efficient.As in my case the table length can go upto 65025 or above.

enter image description here

ThomasMcLeod
  • 7,603
  • 4
  • 42
  • 80
Xara
  • 8,748
  • 16
  • 52
  • 82
  • 2
    It is absolutely unclear how table on the left corresponds to the table on the right - please explain. – mvp Dec 29 '13 at 06:48
  • 1
    Maybe you should look at in-memory database like SQLite – fasked Dec 29 '13 at 06:55
  • Does memory efficient mean packing or speed? – woolstar Dec 29 '13 at 06:56
  • @woolstar By memory efficient , I mean that it should not consume much memory and along with that it should be fast in searching. – Xara Dec 29 '13 at 06:59
  • In your example, input left table has all unique combinations. Is it always true? Can rows in that table repeat? If not, there are better ways... – mvp Dec 29 '13 at 07:24
  • Why the concern with memory footprint? If you "store two values (accepting rule number [integer] and true/false [bool])", that is 5 bytes, times 100 000 elements (to round up your estimate), or 0.5 MB. If you store the integer and two chars as well, you are looking at approximately 1 MB, plus container-related data. These are not big memory demands on modern PCs. If you're developing for something truly memory-limited, or need to have multiple data structures, please correct me. – NicholasM Dec 29 '13 at 07:25
  • @mvp the rows in left table will always be unique. Please do mention which ways will be better in my case. – Xara Dec 29 '13 at 20:47
  • @Zara: it seems that A is simply ordinal number, and as such does not need to be explicitly stored. Rule number also seems to be locked to A. In the end, all you want to store is chars B, C and boolean field active. If your chars B, C can be only lowercase letters, you need only 5 bits for each. Total is 5*2+1=11 bits, or just 2 bytes. You can store 64K array of these in 128KB, and use simple binary search. – mvp Dec 30 '13 at 09:55

2 Answers2

4

The obvious possibilities in the standard library would be std::map and std::unordered_map. For std::map, you'd create some class holding A, B and C, and define a comparison function for that class. For std::unordered_map, you need to define a hash function instead.

For fast searching (and little or no interest in the speed of insertion and deletion) you might also consider using a vector sorted on the A, B, and C fields. This will typically improve speed and reduce space used compared to std::map. The disadvantage is that insertion and deletion become linear instead of logarithmic (i.e., slower--potentially quite a lot slower, especially when the collection is large).

As far as which of these to prefer: if you were working with large enough tables that the big-O complexity was likely to dominate, then std::unordered_map would be the obvious choice--it gives constant (expected) complexity. std::map gives logarithmic complexity. The sorted vector would be logarithmic as well if you used only a binary search. Assuming your keys are distributed reasonably, you could use an interpolating search, which normally has roughly O(log log N) complexity. log log N grows very slowly--so slowly it's often called "pseudo-constant" or something similar. IOW, even for tremendously huge tables, there's not much reason to believe hashing will necessarily be significantly faster.

Big-O analysis is most relevant to much larger tables though (say, hundreds of millions, rather than the tens of thousands suggested in the question). For the table size your suggesting, a logarithmic search algorithm might well be entirely competitive. For example, up to 65536 items, we expect no more than 16 comparisons for a binary search.

A lot then comes down to balancing memory use with search speed. If you're willing to sacrifice some space to get better search speed, a hash table (std::unordered_map) is probably the obvious choice. If you're more concerned with minimizing memory usage, then the sorted vector probably wins. std::map is probably the easiest of the three to implement, and (given the size you're talking about) its speed probably won't be a significant problem either (but the other two will probably be faster).

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • My table length would be significantly high and its important for my algorithm to search fast but at the same time it should not exhaust the memory. So, should I prefer std::unorderedmap (as hash can be fast in searching)?? – Xara Dec 29 '13 at 07:08
  • @Zara: I've added some more information on choosing between the three to my answer. – Jerry Coffin Dec 29 '13 at 07:25
  • `unordered_map` gives constant complexity if the base table is wide enough to contain all the data and if the hash function modulus the base table size will never give same results. If you cannot predict in a reasonable way what size you'll have to deal with, may not be "ideal" (and will never be "constant") The `map` will probably be the best compromise between memory and performance, The vector for memory (and performance if it does not change too often). – Emilio Garavaglia Dec 29 '13 at 09:04
  • @JerryCoffin In one of the similar problems, I am making use of 2D array.As you mentioned that vector is fast in searching but insertion is slow then should I prefer vector over 2D array? because in my case searching speed is important not the insertion speed. – Xara Jan 07 '14 at 12:14
  • @Zara: In most cases, instead of a 2D array, I'd use a simple wrapper around a vector to provide 2D addressing. http://stackoverflow.com/a/6465254/179910 – Jerry Coffin Jan 07 '14 at 16:30
3

I'm not sure of exactly what you mean by memory efficient, but a data structure like the following would certainly work:

struct my_data
{
  int accepting_rule;
  bool true_false;
};

std::map<std::tuple<int, char, char>, my_data> my_map;
ThomasMcLeod
  • 7,603
  • 4
  • 42
  • 80