13

What is a good way to represent sparse set of integers (really C memory addresses) in a compact and fast way. I already know about the obvious things like bit-vectors and run-length encoding. but I want something much more compact than one word per set element. I need to add and remove elements and test for membership. I do not need other set operations, like union.

I read about one such library many years ago but have since forgotten its name. I think it was released as open source by HP and had a womans name.

ShreevatsaR
  • 38,402
  • 17
  • 102
  • 126
Per Mildner
  • 10,469
  • 23
  • 30
  • 1
    The <1 word per pointer bit is going to be the hard part. – BCS Dec 11 '08 at 21:51
  • You don't say how many addresses you'll store in the set. That's critical. Also you don't say if they come from malloc. – Norman Ramsey Jan 01 '09 at 19:18
  • You might check out answers to a similar question that I asked: http://stackoverflow.com/questions/36106/what-are-some-alternatives-to-a-bit-array – erickson Jan 01 '09 at 20:12
  • The addresses do not come from malloc(), in general. In one usage there can be a few 100k members in the set (on a 32bit machine). – Per Mildner Feb 17 '09 at 14:15

4 Answers4

10

You are referring to a judy array. It was a HP project. I think they are used in ruby and are available in c. Very interesting data structure. Making use of the fact that allocations are (at least) word aligned, having separate structures for dense and sparse ranges.

http://judy.sourceforge.net/index.html

Stephan Eggermont
  • 15,847
  • 1
  • 38
  • 65
4

A very compact data structure would be a bloom filter, perhaps a counting bloom filter to support deletions.

http://en.wikipedia.org/wiki/Bloom_filter

The Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. Elements can be added to the set, but not removed (though this can be addressed with a counting filter)

jakber
  • 3,549
  • 20
  • 20
  • Thanks. I know about these but I cannot accept false positives (false negatives could be acceptable but less than ideal). – Per Mildner Dec 11 '08 at 21:57
1

If all you need is insertion, deletion, and test for membership, then a hash table should suit you nicely. You can find some good hash functions for hashing 32-bit integers here.

Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
0

If you want the structure smaller than the data set than you should probably look at some kind of tree arrangement. Make each level of a 4 way the tree key off 2 bits starting at the high end and it might compact quite well (if the pointers have any degree of spacial locality). The trick would be encoding it compactly enough (indexes into arrays of nodes? an array mapped tree?).

BCS
  • 75,627
  • 68
  • 187
  • 294