One optimization would be to keep a list (array for fast lookup) with the min/max values for each set. Then first check in that list if they overlap. If not -> return false - no further checks needed.
S1: a b c
S2: d e f
S1 and S2 -> false (no overlap)
If the sets are sorted and they do overlap, you only have to check the overlapping region.
S1: a b c d e
S2: d e f g h
Only check the 'd e' region
If you need to check the intersection of more than 2 sets, first try to find two non-overlapping sets. If found -> return false. If not, only check the overlapping region for all those sets (which should get smaller with more sets).
S1: a b c d e
S2: d e f g h
S3: g h i
S1 and S2 and S3 -> false (S1 and S3 do not overlap)
If most of the sets span a wide range, you could use another option:
Lets say the maximum number of elements is 6400 (for this example), and every element is, or can be converted to an integer 1-6400.
For each set a small bitmap (64 bit unsigned integer) could be created, with one bit representing 100 items.
For example:
S1: 1,23,80,312,340
S2: 160,184,450
S3: 230,250,340
S1 bitmap: 100100..
S2 bitmap: 010010..
S3 bitmap: 001100..
S1 and S2 -> false
S1 and S3 -> true, only check range 301-400
S1 and S2 and S3 -> false
You could of course use a smaller number than 100 (preferable a power of 2, so you can quickly set the corresponding bit) and use multiple uint64
.
This could even be done in multiple levels (depending on the amount of memory / storage space you're willing to use). For example, first a real quick check on one 64 bit integer (takes one CPU cycle and can easily be done with SQL). Only for those that match check a second level containing maybe 4, 8 or 16 uint64
, with each bit representing a smaller range of values (could also be really fast using SSE/AVX registers). If they still match do a deeper check, but only for the ranges corresponding to the set bits in the result.