2

I need to store large number of integers. There can be duplicates in the input stream of integers, I just need to store distinct amongst them. I was using stl set initially but It went OutOfMem when input number of integers went too high. I am looking for some C++ container library which would allow me to store numbers with the said requirement possibly backed by file i.e container should not try to keep all numbers in-mem. I don't need to store this data persistently, I just need to find unique values amongst it.

Pqr
  • 121
  • 1
  • 4
  • What's the integer values range? – sbk Jun 03 '10 at 15:41
  • 1
    I'm too embarrassed to call this an answer, but maybe you could just rebuild your app for 64-bit and run it on a 64-bit system. – OldFart Jun 03 '10 at 15:54
  • The problem is main mem is not large enough to hold all numbers and we need external-mem-backed set like container. – Pqr Jun 04 '10 at 07:48

4 Answers4

1

Take a look at the STXXL; might be what you're looking for.

Edit: I haven't used it myself, but from the docs - you could use stream::runs_creator to create sorted runs of your data (however much fits in memory), then stream::runs_merger to merge the sorted streams, and finally use stream::unique to filter uniques.

tzaman
  • 46,925
  • 11
  • 90
  • 115
  • I just looked at stlxxl. At first glance, it does not seem to have anything like stl::set. – Pqr Jun 03 '10 at 15:01
0

Since you need larger than RAM allows you might look at memcached

Jay
  • 13,803
  • 4
  • 42
  • 69
0

Have you considered using DB (maybe SQLite)? Or it would be too slow?

Dmitry Yudakov
  • 15,364
  • 4
  • 49
  • 53
0

You should seriously at least try a database before concluding it is too slow. All you need is one of the lightweight key-value store ones. In the past I have used Berkeley DB, but here is a list of other ones.

Richard Wolf
  • 4,039
  • 1
  • 21
  • 14