2

I have store 111 million key-value pairs (one key can have multiple values - maximum 2/3) whose key are 50 bit Integers and values are 32 bit (maximum) Integers. Now, my requirements are:

  1. Fast Insertion of (Key, Value) pair [allowing duplicates]
  2. Fast retrieving of value/values based on key.

Is there any library for C/C++ which solves this issue (using MultiMap, B+ Tree, B Tree, R+ Tree etc.) ? I can provide 5/6 GB main memory for that. For more info: my previous post.

Community
  • 1
  • 1
Arpssss
  • 3,850
  • 6
  • 36
  • 80
  • Your whole dataset can take up 9GB. There is the simplest possible solution - using std::map/std::multimap and relying on operating system swap. How does this work? – Rafał Rawicki Apr 09 '12 at 10:14
  • 1
    C or C++, pick one. The answers will most likely be completely different (if there are any) in both these languages. – Mat Apr 09 '12 at 10:15
  • You have to provide more parameters, like are all the values there from the start? Is speed or space the prime parameter? The initial C++ answer would be to load it all in a std::vector, and sort it. Then use a binary search. – Bo Persson Apr 09 '12 at 10:17
  • @BoPersson, No. I have to repeatedly add incoming Key,Value – Arpssss Apr 09 '12 at 10:18
  • @RafałRawicki, I can provide only 5/6 GB. – Arpssss Apr 09 '12 at 10:19
  • @Mat, Java takes more memory because of OOP. So, C should take less to implement. C is preferable. As I have few idea about C/C++, I am searching for both. Which I found better, switch to that one :). – Arpssss Apr 09 '12 at 10:21
  • Is an external server an option? If so, redis might be able to solve your problem. – Not_a_Golfer Apr 09 '12 at 10:22
  • @Not_a_Golfer, I can use that.But, how to solve ? – Arpssss Apr 09 '12 at 10:24
  • sqlite/MySQL? (I think) with data set this big you'll need a databse anyway. – SigTerm Apr 09 '12 at 10:24
  • @SigTerm, I have used Tokyo Cabinet. Theoretically, much faster. Issue is lots of params. And practically I found inefficient. – Arpssss Apr 09 '12 at 10:26
  • @SigTerm, doest it allow On-Memory implementation ? Means, if it is able to store the DB on-memory ? I have no requirement to load-store. – Arpssss Apr 09 '12 at 10:27
  • @Arpsss: Does "5/6 GB" mean "0.83 gigabytes" or "5..6 gigabytes"? – SigTerm Apr 09 '12 at 10:54
  • @SigTerm, sorry. It is OR. Say 5 or 6.. – Arpssss Apr 09 '12 at 10:57

3 Answers3

0

A plain hashtable in C would take 50+32 (+14padding) + 32 +32 bits per element. (+ maybe 32 bit alignment). That is 160 (or 192) bits per element := 20 (or 24) bytes per element. The hash table would cost you 111* 20 (or 111*24) Mbytes of memory. That is 2.2 GB or 2.7GB.

wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • Thanks. Actually, I don't know C/C++ (just little bit in high-school level).That's why I am searching for ready made one. Any pointer to a nice library will be helpful to me. – Arpssss Apr 09 '12 at 11:10
  • Do you get paid for this, or is it homework? (cannot do the math, cannot create the code myself, cannot search the web for "easy to use" libraries) – wildplasser Apr 09 '12 at 11:17
  • I tried lots of libraries and codes. However, can't achieve required performance. That's why before starting new one, want advice. BTW, I get paid for doing this stupid things by my father:). And I have to solve lots of projects and exams that's why may be asked some stupid question. Sorry. – Arpssss Apr 09 '12 at 11:50
0

Your requirements do not include any need for an ordered collection. Use a hash map. If you can't find a ready made one, it's not a big challenge to create one.

selalerer
  • 3,766
  • 2
  • 23
  • 33
0

Because "5/6 Gigabytes" actually means 5 OR 6 gigabytes...

111000000 key/value pairs with 50 bit keys and 32bit values would take (111000000 * (50+32))/(8*1024*1024*1024) = 1.05 gigabytes or memory when stored as tightly packed (bit) array.

You have 5 times more memory then that.

A 10 level deep skip-list-based map on 64bit system would take (111000000 * (64+32+10*16))/(8*1024*1024*1024) = 3.308 Gigabytes in worst case scenario and you'll still have more than gigabyte of RAM to deal with heap management overhead.

So I'd advise to grab any available multimap and attempt to use it - in my opinion you have more than enough memory to deal with your situation without using any extra tricks.

--EDIT--

Actually, I don't know C/C++

Well, how do you expect to work with maps that contain 111000000 of keys if you don't know C++? You'll have to do some reading.

standard library includes std::multimap, and there are several classes within boost library. Qt 4 includes QMap which is based on skip lists. Try using any of them.

SigTerm
  • 26,089
  • 6
  • 66
  • 115
  • Thanks. Actually, I don't know C/C++ (just little bit in high-school level).That's why I am searching for ready made one. Any pointer to a nice library will be helpful to me. Because, as in Java, I can't solve it. – Arpssss Apr 09 '12 at 11:12
  • Thanks. I am expecting what you told because I face same issue in Java (as OOP). But, from some discussion I found, C should have such facility and C++ may have. That's why try to get an idea thus can start with hope. If C/C++ have, I start reading. Actually, I lost many time, using some stupid tools. So, just little bit stressed. BTW, thanks. – Arpssss Apr 09 '12 at 11:56