How to evaluate zero sets fast?

Question

This recent code golfing post asked the possibilities of fast implementation in C the following (assuming n is an unsigned integer):

if (n==6 || n==8 || n==10 || n==12 || n==14 || n==16 || n==18 || n==20)

One possible simplification is to observe that the numbers a[]={6,8,10,12,14,16,18,20} form an arithmetic progression, so shifting the range and then using some bitwise tricks

if (((n - 6) & 14) + 6 == n)

leads to a shorter (and probably indeed more efficient) implementation, as answered by John Bollinger.

Now I am asking what is the analogously elegant (and hopefully equally efficient) implementation of

if (n==3 || n==5 || n==11 || n==29 || n==83 || n==245 || n==731 || n==2189)

Hint: this time the numbers a[k] form a geometric progression: a[k]=2+3^k.

I guess in the general case one cannot do better than sort the numbers a[k] and then do a logarithmic search to test if n is a member of the sorted array.

if this is a time over space tradeoff then I would allocate a 2189 words buffer filled with zeros except for the specific locations (3, 5, 11 etc...) and simply perform an array lookup (which is extremely quick since it is hardware implemented). This is not very elegant but will give you top time performance with the cost of space performance. — Uri Brecher, May 02 '16 at 07:45
Shouldn't such questions be posted on http://codereview.stackexchange.com? — CinCout, May 02 '16 at 07:58
http://stackoverflow.com/questions/1804311/how-to-check-if-an-integer-is-a-power-of-3 — Dummy00001, May 02 '16 at 07:59
FWIW, `((n - 6) & 14) + 6 == n` can be simplified to `(n - 6) | 14 == 14`. — Tony Delroy, May 02 '16 at 08:08
@UriBrecher are you suggesting that the implied mathematical structure, namely, that we have a *geometric progression*, is *irrelevant* from the point of performance?? I have quite a hard time to believe this. — Matsmath, May 02 '16 at 08:15
The discussion is not completely theoretical since my solution has a space penalty and sometimes (say your geometric progression is infinite or ends at 2^64) is not applicable. For your specific question, since the series ends at 2189 then yes, from time performance perspective a single array lookup is a lot faster then any of the solutions I've seen so far. To be fair, if this is a one shot operation then the mathematical solutions are better (since the memory allocation takes time as well), but if this is a branch that has to be performed several times then lookup is eventually superior. — Uri Brecher, May 02 '16 at 09:03
@Matsmath What UriBrecher suggests is to define a huge lookup table which for every number contains true or false. Then, instead of writing the if with all the comparisons, you just do `if (table[n]) {...} else {...}`. This results in a single array element access. — Jens, May 02 '16 at 09:10
@UriBrecher - For 2k values, maybe. For the general case, the impact of a huge lookup table on cache is probably not negligible. — Oliver Charlesworth, May 02 '16 at 12:08
Are you going for raw speed, or elegant-looking C code? They are likely to be different. Division (including modulo) operations are about the slowest operations there are on modern processors, so thiru's answer looks cool, but likely won't be faster than a series of comparisons. Check your compiler's output and *benchmark* before making a decision, if you actually do care about speed. Also consider that branches are slow in tight loops. If your optimizer isn't smart enough to do this substitution automatically (Clang is, others aren't), consider using bitwise OR instead of logical OR. — Cody Gray - on strike, May 02 '16 at 14:26
The geometric progression is not helpful here, as using it would require division and is therefore too slow. Even an arithmetic progression may or may not be useful. +++ Binary search is much slower than hashing due to branching. +++ A tiny hash table containing maybe 16 entries is most probably the fastest solution (I'll provide it if you're interested). +++ See also https://en.wikipedia.org/wiki/Perfect_hash_function. — maaartinus, May 02 '16 at 15:46

Thirupathi Thangavel · Accepted Answer · 2016-05-02T11:24:48.020

25

if ((n > 2) && (2187 % (n - 2) == 0))

Checks if (n - 2) is a power of 3 and is less than or equal to 2187 (3 to the power of 7)

As a generalization, to check if any unsigned integer n is a power of prime number k, you can check if n divides the largest power of k that can be stored in an unsigned integer.

edited May 02 '16 at 11:24

answered May 02 '16 at 08:05

Thirupathi Thangavel

2,418
3
29
49

1

It's a nice answer, but you should change "less than or equal to 2187" to "less than or equal to 2189". Also, you should add a preliminary verification of `n > 2`. – barak manos May 02 '16 at 08:10
@barakmanos, thanks. I've updated the answer to check for n > 2. – Thirupathi Thangavel May 02 '16 at 08:21
I think "less than or equal to 2187" is correct because it (n - 2). – Thirupathi Thangavel May 02 '16 at 08:22
2

@thiru Great work! (I can confirm via testing that your solution is correct and is faster (1.14703 ns, compared to 3.34201 ns) – Isaac May 02 '16 at 08:40
1

You could mention that 3^7 = 2187 otherwise 2187 looks like a magic number. – chi May 02 '16 at 10:52
3

Beware that division operations (including modulo) are basically the slowest instructions there are on modern processors, far slower than comparisons. So although this is a neat way to write the code (and granted, exactly the answer the asker was looking for), it is probably *not* the fastest code. – Cody Gray - on strike May 02 '16 at 14:31
3

@CodyGray 32-bit integer division is much slower than comparison, but should be faster than a single mispredicted jump – Tavian Barnes May 02 '16 at 17:27
2

Let's accept this as being *analogously elegant*. I thank you for your answer. – Matsmath May 03 '16 at 07:37
1

I'm not sure if that's actually true, @Tavian. It will be *very* close. On Sandy Bridge, for example, you're looking at a difference of approximately 5 cycles. You'd have to benchmark it to be sure. Instruction tables only tell you so much. Besides, there's no guarantee the branch will be mispredicted. :-) If you were really worried about it, you'd just eliminate the possibility by replacing `&&` with `&` to persuade the compiler to emit bit-twiddling code and even a conditional move (if one exists for your target architecture---x86 has had one for years). *Way* faster than a divide. – Cody Gray - on strike May 03 '16 at 13:44
@CodyGray If you replace `&&` with `&` you'll divide by zero :). But really what I was talking about was the mispredicted jumps that would happen if you did a linear scan or binary search or something. But then again, you can avoid some of those too: http://www.pvk.ca/Blog/2012/07/03/binary-search-star-eliminates-star-branch-mispredictions/ – Tavian Barnes May 03 '16 at 13:53

score 8 · Answer 2 · edited May 23 '17 at 12:18

8

This is very similar to recognizing a power of three, and you can adapt for example this solution:

bool test(unsigned x) {
    x -= 2;
    if (x > 2187)
        return 0;
    if (x > 243)
        x *= 0xd2b3183b;
    return x <= 243 && ((x * 0x71c5) & 0x5145) == 0x5145;
}

With the given range, this can be further simplified (found by brute force):

bool test2(unsigned x) {
  x -= 2;
  return x <= 2187 && (((x * 0x4be55) & 0xd2514105) == 5);
}

edited May 23 '17 at 12:18

Community

1
1

answered May 02 '16 at 18:14

Falk Hüffner

4,942
19
25

2

Heck, this is some good stuff here. Could you elaborate in a few lines how you got from the "basic" version to the "advanced" version? What is the big idea here? – Matsmath May 03 '16 at 09:53

Khaled.K · Answer 3 · 2016-05-02T16:02:57.467

5

Hint: this time the numbers a[k] form a geometric progression: a[k]=2+3^k.

n = 2 + 3^k
n - 2 = 3^k

(n - 2) / 3^k = 1
(n - 2) % 3^k = 0

k = 0 ~ n-2 = 3^0 = 1, n = 3
k = 1 ~ n-2 = 3^1 = 3, n = 5
k = 2 ~ n-2 = 3^3 = 9, n = 11

if (n > 2 && isPow(n-2, 3))

with definition of function isPow(x,y)

bool isPow(unsigned long x, unsigned int y)
{
    while (x % y == 0)
    {
        x /= y;
    }

    return x == 1;
}

n = n  ~ n-2 = 3^k/3 = 3^(k-1)/3 = .. = 3^1/3 = 1%3 != 0
n = 11 ~ n-2 = 9/3 = 3/3 = 1%3 != 0
n = 5  ~ n-2 = 3/3 = 1%3 != 0
n = 3  ~ n-2 = 1%3 != 0

Similiarly we can deduct k..

int k = findK(n-2, 3);

int findK(unsigned long x, unsigned int y)
{
    unsigned int k = 0;

    while (x % y == 0)
    {
        x /= y;
        k++;
    }

    if (x == 1) return k;
    else return (-1);
}

n - 2 = 3 * 3^(k-1)       for k > 0
(n - 2) / 3 = 3^(k-1)
(n - 2) / 3 / 3 = 3^(k-2)
(n - 2) / 3 / 3 / 3 = 3^(k-3)
(n - 2) / 3 / 3 / 3 / 3 = 3^(k-4)
..
(n - 2) / 3 / 3 / ..i times = 3^(k-i)
..
(n - 2) / 3 / 3 / ..k times = 3^0 = 1

edited May 02 '16 at 16:02

answered May 02 '16 at 08:54

Khaled.K

5,828
1
33
51

1

Has the same time complexity as linear search. – n. m. could be an AI May 02 '16 at 10:24
@n.m. I was going for the `analogously elegant` – Khaled.K May 02 '16 at 11:04
In your first code: why do you move the `3*` to the left become a `/ 3` and at the next step you move it right again and move left the `3^(k-1)` term? Also you repeated the equation `(n - 2) % 3^k = 0` twice. – Bakuriu May 02 '16 at 13:14
@Bakuriu yes the repeated equation is a typo, the `3` and `3^(k-1)` swapped to show off how to apply `(n - 2) % 3^k = 0` in the function `isPow(x,y)` – Khaled.K May 02 '16 at 15:50

score 3 · Answer 4 · edited May 02 '16 at 08:11

3

Found a similar problem in a related post: You can use std::find

bool found = (std::find(my_var.begin(), my_var.end(), my_variable) != my_var.end());
// my_var is a list of elements.

(make sure you include <algorithm>).

For this kind of stuff, in 99% of cases, there is a library that does the job for you.

edited May 02 '16 at 08:11

Jarod42

203,559
14
181
302

answered May 02 '16 at 07:59

Cristian Udrea

76
5

2

`std::find` has linear runtime. I don't think that would count as the fastest implementation possible... – Jens May 02 '16 at 08:02
2

For 8 values, this may well be fast - many CPUs have a repeated-compare operation with a counter to limit the amount of data searched, and it's not much data to cache. I wouldn't be at all surprised if this outperformed a binary search. – Tony Delroy May 02 '16 at 08:15
1

@TonyD I think there was a presentation on one of the C++ conferences where it was actually measured and the linear search is faster for small sets. However, the question also asks for the general case, and he is counting assembler instructions. In this artificial case, I don't think a linear search qualifies. – Jens May 02 '16 at 08:56
@Jens: *"the question also asks for the general case"* - I haven't read what's asked in the code golfing post linked from the question, but the question on this page says *"I am asking what is the analogously elegant (and hopefully the same way efficient) implementation of... [the 8-value case]"*, and goes on to speculate on the general case without explicitly asking another question. So, general solutions are interesting, but a fast-as-possible solution for the explicitly asked case is of at least equal interest. This may or may not be the fastest, but it's worth benchmarking. – Tony Delroy May 02 '16 at 09:39
I have the feeling that including extra libraries is a serious overkill, and might actually compromise performance. – Matsmath May 02 '16 at 10:06
3

@Matsmath In this profession we operate with facts, not feelings. – n. m. could be an AI May 02 '16 at 10:25
4

@Matsmath Why would including a library compromise performance? The function call will probably be inlined anyway. The performance will not be different than a hand-written loop. – Jens May 02 '16 at 10:28
1

I second what @n.m. and Jens mentioned. – Cristian Udrea May 02 '16 at 11:58
@Matsmath: it's not another library anyway, but another header in the Standard Library.. all C++ implementations are required to provide it, and it's designed to be easy for implementations to make at least as fast as any element-finding loop you could write yourself (when optimisation's enabled). – Tony Delroy May 05 '16 at 04:29

score 3 · Answer 5 · answered May 02 '16 at 14:13

3

One very efficient way to do this sort of thing is with a set, especially an unordered_set. As a hash table, search is average constant, worst linear. It is also much more elegant than a string of conditions and scales well.

Simply insert the values you want to compare against, and count the value in the set. If it is found, it's one of your tested values.

std::unordered_set< int > values;
values.insert( 3 );
values.insert( 5 );
values.insert( 11 );
values.insert( 29 );
values.insert( 83 );
values.insert( 245 );
values.insert( 731 );
values.insert( 2189 );

...

if( values.count( input ) )
    std::cout << "Value is in set.\n";
else
    std::cout << "Value is NOT in set.\n";

answered May 02 '16 at 14:13

2

I like this approach to the problem. What continues to surprise me is that here you don't exploit the structure of the numbers, that is, the fact that they are not just some random numbers, but they form a geometric progression. You claim here that this enormous information has no impact whatsoever on performance, which, of course, could be true, but it is something that *looks* counter-intuitive to me. – Matsmath May 02 '16 at 14:27
@Matsmath Well, there's implicitly another equally important piece of information about these numbers that exploiting the mathematical progression ignores: how they impact the hashing function, and thus how they are handled in the set. That said, you might be able to exploit the series progression in writing a custom hashing function; that might allow collisions to be eliminated entirely and thus force constant search. However, that's mostly an academic observation, I don't think anyone would ever do that for this sort of thing. – May 02 '16 at 14:44
@WilliamKappler I disagree with your last comment sentence: https://en.wikipedia.org/wiki/Perfect_hash_function. There are rare cases, when this is needed. – maaartinus May 02 '16 at 16:18
Let me clarify, by "this sort of thing" I meant using one in the context of replacing a chained conditional, rather than custom hash functions as a whole. It would seem really excessive to write a custom hash function for around 10 elements. But there's definitely cases where custom functions are helpful in a broader context. – May 02 '16 at 16:21

Isaac · Answer 6 · 2016-05-02T08:32:11.887

0

Here's what I came up with:

bool b;
if (n >= 3 && n <= 2189)
{
    double d = std::log(n - 2)/std::log(3); // log₃(n - 2)
    b = std::trunc(d) == d; // Checks to see if this is an integer
    // If ‘b’ then ‘n == a[log₃(n - 2)]’
}
else b = false;

if (b)
{
    // your code
}

As the number n is a small integer, floating point inaccuracies shouldn't be a problem.

Ofcourse it won't be as fast as integer arithmetic, an it doesn't fit snuggly in an expression, it is probably faster than some kind of array search, especially if you where to increase the maximum n value.

EDIT I tested my code (without optimizations enabled) and on average (for all n less than or equal to 2189) my version took 185.849 ns wheras the || version took 116.546 ns, (running my program multiple times yielded similar results) so this is not faster. Also for some bizarre reason, it sets b to false when n == 245, (it should be true).

edited May 02 '16 at 08:32

answered May 02 '16 at 07:53

Isaac

816
5
12

1

What if `n < 2`? BTW, computing a single `log` probably takes more than the original expression. At least for `log(3)`, since the question is about performance, you should probably maintain it in a `static` variable (i.e., compute it only once). – barak manos May 02 '16 at 07:54
@barakmanos thanks I fixed that. Hmm, I didn't think of that, perhaps I'll test it... – Isaac May 02 '16 at 07:57
Second thought, `log` with negative probably returns `-inf` of `NaN`, in which case, the test that follows will probably work. – barak manos May 02 '16 at 07:59
`static double log3 = std::log(3)`. – barak manos May 02 '16 at 07:59
@barakmanos well thank you, you were totally write, my version sucks (it's slow) – Isaac May 02 '16 at 08:33
3

Why are you testing it without the optimisations? That pretty much makes the whole point of which one is faster moot. – Davidmh May 02 '16 at 10:49

How to evaluate zero sets fast?

6 Answers6