I want an efficient data structure that represents a set of integers from the range 0..k
, where k
is very small, let's say less than 1024
.
I have come up with an implementation with the following complexities:
initialize: O(1)
size: O(1)
contains: O(1)
insert: O(1)
remove: O(1)
clear: O(1)
union: O(k)
However, I suspect my O(1)
s might really by O(log(k))
since we have to read O(log(k))
digits somewhere.
The memory requirement is O(k)
.
I have not implemented equals
, intersection
or difference
because I don't need them, but they all would be O(k)
. My implementation does support iteration.
This seems like it could come up in many different problems, so I'm wondering has my implementation been done before?
And more importantly: Can we do better?
Obviously better is a little subjective, but something with only O(1)
and O(log(k))
everywhere might be interesting.
I suspect bitsets
could be more practical when k
is a little bigger, but I'm not sure about for such small k
.
Here is pseudocode of what I have. It works by allocating an array and a map (implemented as another array) of lengths k
and k+1
, but only caring about the first m
elements of the array, where m
is the current size of the set.
class IntSet:
m: int
arr: array[int]
map: array[int]
init(k):
arr: int[k] # allocate, but don't initialize
map: int[k+1] # allocate, but don't initialize
m = 0
size(): m
contains(x: int): map[x] < m and arr[map[x]] == x
insert(x: int):
if not contains(x):
arr[m] = x
map[x] = m
m += 1
remove(x: int):
if contains(x):
m -= 1
arr[map[x]] = arr[m] # move x off the end
map[arr[m]] = map[x]
clear(): m = 0
union(s: IntSet):
for i in 0..s.m:
if not contains(s.arr[i]):
insert(s.arr[i])
items():
for i in 0..m:
yield arr[i]
EDIT:
Based on my understanding, I imagine a bitset implementation would look like:
initialize: O(k) size:O(popcount of k bits)O(1) contains: O(1) insert: O(1) remove: O(1) clear: O(k) union: O(bitwise OR of k bits)
If we use a bitset with 1024 bits I'm not sure if this is better.