9

I have a loop with the following structure :

  • Calculate a byte array with length k (somewhere slow)
  • Find if the calculated byte array matches any in a list of N byte arrays I have.
  • Repeat

My loop is to be called many many times (it's the main loop of my program), and I want the second step to be as fast as possible.

The naive implementation for the second step would be using memcmp:

char* calc;
char** list;
int k, n, i;
for(i = 0; i < n; i++) {
  if (!memcmp(calc, list[i], k)) {
    printf("Matches array %d", i);
  }
}

Can you think of any faster way ? A few things :

  • My list is fixed at the start of my program, any precomputation on it is fine.
  • Let's assume that k is small (<= 64), N is moderate (around 100-1000).
  • Performance is the goal here, and portability is a non issue. Intrinsics/inline assembly is fine, as long as it's faster.

Here are a few thoughts that I had :

  • Given k<64 and I'm on x86_64, I could sort my lookup array as a long array, and do a binary search on it. O(log(n)). Even if k was big, I could sort my lookup array and do this binary search using memcmp.
  • Given k is small, again, I could compute a 8/16/32 bits checksum (the simplest being folding my arrays over themselves using a xor) of all my lookup arrays and use a built-in PCMPGT as in How to compare more than two numbers in parallel?. I know SSE4.2 is available here.

Do you think going for vectorization/sse is a good idea here ? If yes, what do you think is the best approach. I'd like to say that this isn't early optimization, but performance is crucial here, I need the outer loop to be as fast as possible. Thanks

EDIT1: It looks like http://schani.wordpress.com/tag/c-optimization-linear-binary-search-sse2-simd/ provides some interesting thoughts about it. Binary search on a list of long seems the way to go..

Community
  • 1
  • 1
Wam
  • 810
  • 1
  • 8
  • 19
  • 1
    With N < 1000, binary search at most saves you a factor of 10 in comparisons. That's something that *could* be swallowed by constant factors e.g. costs of branch mispredictions, of more complicated logic. So binary search wouldn't be my first attempt. –  Jan 17 '14 at 10:40
  • That's an excellent point. I'll try that, but it will probably go lower in my list, especially since BS is prone to branch mispredictions. – Wam Jan 17 '14 at 11:03
  • You're looking at it the wrong way, repeated comparisons, no matter how vectorized, would be orders of magnitude slower than keeping a hash like abligh suggested. I'd also suggest using the `algorithm` tag for this question for better exposure (instead of all these unrelated ISA tags) – Leeor Jan 17 '14 at 12:28
  • We can assume that my stuff fits in 64bits, or even 32bits. If it wasn't, I could hash it so it could. But now, what's the fastest way to find whether my hash exists in the list of precomputed hashes ? Also, I added the `algorithm` tag. – Wam Jan 17 '14 at 12:32
  • silly question: if calculating the byte array is slow, and it's happening in the main loop, why do you *care* how fast the lookup goes? I don't know what you mean by slow, but surely a log(n) lookup into a list of 1000 or so hashes in an ordered list is blindingly fast compared to anything you might call a "slow" calculation... – JVMATL Jan 17 '14 at 14:40
  • @JVMATL Calculation is slowish, but not horribly slow. My current benchmark (doing just the computation, not the comparison) is roughly 40k computations per second. I just want to make sure the lookup won't take a significant amount of time and slow down the loop even more. – Wam Jan 17 '14 at 16:58
  • 1
    I suspect that, for a small enough N (and 1000 is small) you won't find anything faster than a binary search. But maybe you could benchmark this: generate a sorted list of 1000 ints (or longs) from a random number generator. After each calculation, generate some hash and binary-search the list. Since the list is random, you will likely miss every single time, simulating your worst case - see what that does to your computations/second. (your hash computation is likely to cost more than the search, so if the 'hash+search' is too slow, look for a better hash before optimizing past binary search. – JVMATL Jan 17 '14 at 17:07

4 Answers4

7

The optimum solution is going to depend on how many arrays there are to match, the size of the arrays, and how often they change. I would look at avoiding doing the comparisons at all.

Assuming the list of arrays to compare it to does not change frequently and you have many such arrays, I would create a hash of each array, then when you come to compare, hash the thing you are testing. Then you only need compare the hash values. With a hash like SHA256, you can rely on this both as a positive and negative indicator (i.e. the hashes matching is sufficient to say the arrays match as well as the hashes not matching being sufficient to say the arrays differ). This would work very well if you had (say) 1,000,000 arrays to compare against which hardly ever change, as calculating the hash would be faster than 1,000,000 array comparisons.

If your number of arrays is a bit smaller, you might consider a faster non-crytographic hash. For instance, a 'hash' which simply summed the bytes in an array module 256 (this is a terrible hash and you can do much better) would eliminate the need to compare (say) 255/256ths of the target array space. You could then compare only those where the so called 'hash' matches. There are well known hash-like things such as CRC-32 which are quick to calculate.

In either case you can then have a look up by hash (modulo X) to determine which arrays to actually compare.

You suggest k is small, N is moderate (i.e. about 1000). I'm guessing speed will revolve around memory cache. Not accessing 1,000 small arrays here is going to be pretty helpful.

All the above will be useless if the arrays change with a frequency similar to the comparison.

Addition (assuming you are looking at 64 bytes or similar). I'd look into a very fast non-cryptographic hash function. For instance look at: https://code.google.com/p/smhasher/wiki/MurmurHash3

It looks like 3-4 instructions per 32 bit word to generate the hash. You could then truncate the result to (say) 12 bits for a 4096 entry hash table with very few collisions (each bucket being linked list to the target arrays). This means you would look at something like about 30 instructions to calculate the hash, then one instruction per bucket entry (expected value 1) to find the list item, then one manual compare per expected hit (that would be between 0 and 1). So rather than comparing 1000 arrays, you would compare between 0 and 1 arrays, and generate one hash. If you can't compare 999 arrays in 30-ish instructions (I'm guessing not!) this is obviously a win.

abligh
  • 24,573
  • 4
  • 47
  • 84
  • As I mentionned in my question, the arrays don't change at all, and the comparison is small (less than 64bits). I had thought of using such a hash (or just reading each array as a long), but was wondering what the fastest way to find any matching array would be ? – Wam Jan 17 '14 at 11:21
  • Do you really mean 'less than 64 bits' or do you mean 'less than 64 bytes'? If less than 64 bits and on an x86_64, just clear the unused bits and you are left with comparing 64 bit integers. If 'less than 64 bytes', see the edit I'm just about to put on (run out of comment space). – abligh Jan 17 '14 at 11:51
  • Well, I'd be interested in a more generic solution, but for now 64 bits as in stuffable in a uint64_t would do. In which case I have as you say integers, and I'm left with a set problem of finding whether a set of `long` contains mine. – Wam Jan 17 '14 at 12:02
  • 1
    As the longs are constant, sort them and do a binary search. – abligh Jan 17 '14 at 14:26
2

We can assume that my stuff fits in 64bits, or even 32bits. If it wasn't, I could hash it so it could. But now, what's the fastest way to find whether my hash exists in the list of precomputed hashes ?

This is sort of a meta-answer, but... if your question boils down to: how can I efficiently find whether a certain 32-bit number exists in a list of other 32-bit numbers, this is a problem IP routers deal with all the time, so it might be worth looking into the networking literature to see if there's something you can adapt from their algorithms. e.g. see http://cit.mak.ac.ug/iccir/downloads/SREC_07/K.J.Poornaselvan1,S.Suresh,%20C.Divya%20Preya%20and%20C.G.Gayathri_07.pdf

(Although, I suspect they are optimized for searching through larger numbers of items than your use case..)

JVMATL
  • 2,064
  • 15
  • 25
0

can you do an XOR instead of memcmp ?

or caclulate hash of each element in the array and sort it search for the hash

but hash will take more time .unless you can come up with a faster hash

Simal Haneef
  • 179
  • 5
  • 1
    Xor is an awful hash; every array of constant values has the same hash if the hash is of even size. Awful hashes have poor collision properties. With a very tiny speed overhead you can use a better non-cryptographic hash function (as I point out above), e.g. Murmurhash3. – abligh Jan 17 '14 at 11:54
0

Another way is to pre-build a tree from your list and use tree search.
for examples, with list:

aaaa
aaca
acbc
acca
bcaa
bcca
caca

we can get a tree like this

root
-a
--a
---a
----a
---c
----a
--c
---b
----c
---c
----a
-b
--c
---a
----a
---c
----a
-c
--a
---c
----a

Then do binary search on each level of the tree

vutran
  • 404
  • 2
  • 7