Fast average without division

Question

I have a binary search loop which gets hit many times in the execution path.

A profiler shows that the division part of the search (finding the middle index given the high and low indices of the search range) is actually the most costly part of the search, by a factor of about 4.

(I think) it is not critical for efficient binary search to find the exact middle value, just a value near the middle which does not have bias in either direction.

Is there a bit-twiddling algorithm to replace mid = (low + high) / 2 with something much faster?

Edit: Language is C#, but the equivalent bit-operation is valid in any language (although it may be of no performance benefit), which is why I left the C# tag off.

How big are your arrays? Have you tried using a linear search instead? — Michael Myers, Jun 19 '09 at 21:57
This really isn't something that can be language agnostic - the details of how fast this sort of operation is are very platform specific. It's entirely possible that if you are dealing with a dynamically typed language that the division is being done in floating point math, or that a big-int structure is being used. It's also expected in most statically typed languages that something like (low + high) / 2 will be automatically optimized to an add and an arithmetic right shift. — Eclipse, Jun 19 '09 at 22:27
"just a value near the middle which does not have bias in either direction." Doesn't your integer division by 2 have a bias already? — Nosredna, Jun 19 '09 at 22:51
I'm skeptical that the finding of the midpoint is the bottleneck here. An integer division by two should be compiled to a right shift. Are high, low, and mid declared as integers? I'd love to see your whole binary search routine. I think we're missing something here. — Nosredna, Jun 19 '09 at 22:57
It is certainly curious that there is that big of a skew - so I'd be interested to see the whole routine too. — Jonathan Leffler, Jun 19 '09 at 23:46
@Jonathan Leffler--Yeah, it doesn't make sense to me. Either something's accidentally declared or cast to a float rather than an integer, or the profile is is misleading, or there's just not much in the binary search and there's not much to squeeze out. I'd like to see the compiled output of the routine, too. If the implementation is iterative, the conditionals should be more expensive than the average. If it's recursive, the function call overhead should be more expensive. And doesn't C# have an Array.BinarySearch method already? — Nosredna, Jun 19 '09 at 23:57
@onebyone.livejournal.com--that's a good idea. I've certainly seen cases where profiles didn't make sense until I looked at the object code generated by the compiler. — Nosredna, Jun 20 '09 at 01:14

score 19 · Answer 1 · answered Jun 20 '09 at 01:47

19

Here is a bit-hack version of the average that does not suffer from the overflow problem:

unsigned int average (unsigned int x, unsigned int y)
{
  return (x&y)+((x^y)>>1);
}

answered Jun 20 '09 at 01:47

Nils Pipenbrinck

83,631
31
151
221

1

Wow. I LOVE that! Of course, I would expect the add, subtract, and shift of the usual solution to be at least as fast. But you've scored major coolness points there! – Nosredna Jun 20 '09 at 02:12
IMHO on a modern pc the performance differences between the versions does not makes a difference anymore. – Nils Pipenbrinck Jun 20 '09 at 02:58
@Nils: Yes indeed, on modern CPUs it is the unpredictable branches of a binary search that are the speed killers. – Zan Lynx Nov 13 '09 at 05:15
@nilspipenbrinck can you kindly add an explanation of the solution. I mean it works but how did you find the solution? How can I understand it? – Uthman Jan 14 '17 at 19:14
@baltusaj Uh, no. I can't explain how that works within the context of a comment (not having the time). However you're free to ask exactly that as a question. I'm sure the bit-twiddlers around here will be happy to share their knowledge on this topic. – Nils Pipenbrinck Jan 14 '17 at 21:16

Jeff Moser · Accepted Answer · 2009-06-19T21:59:22.563

12

int mid = (low + high) >>> 1;

Be advised that using "(low + high) / 2" for midpoint calculations won't work correctly when integer overflow becomes an issue.

edited Jun 19 '09 at 21:59

answered Jun 19 '09 at 21:52

Jeff Moser

19,727
6
65
85

+1 for the right answer, but I do have to comment, "a lot of binary searches are broken" is a bit sensationalist. More like "a lot of binary search implementations contain an overflow bug that occurs with large numbers of items." – Not Sure Jun 19 '09 at 22:00
Looks like I updated the text just before you made this comment. Is it better now? :) – Jeff Moser Jun 19 '09 at 22:01
I'm with Jimmy on this one... I'd expect the compiler to be making this optimization already. I'd be very surprised if it's faster. – rmeador Jun 19 '09 at 22:11
Shouldn't be any faster with a good compiler, but the fastest way to tell is to try it. – UncleO Jun 19 '09 at 22:13
Question doesn't specify language. Not all languages are compiled. Though how anyone could talk about "faster" without even specifying language, I do not know... Personally, I would happily yell at any compiler-writer who didn't optimise division-by-static-constant-2 to right-shift-1, assuming *unsigned* ints. Right shift of negative values is tricksy in C, and in a binary chop I doubt you need it, so use an unsigned int in case it helps. – Steve Jessop Jun 19 '09 at 22:27
In case anyone's wondering: in C/C++, right shift of negative values might be a logical, not arithmetic shift, in which case it plain doesn't work for division. Furthermore, even with arithmetic shift, -3 >> 1 == -2, which isn't the answer you want. If your type is int, then the compiler doesn't know your values are always positive, so it can't just replace division with shift even if you know that it would work. – Steve Jessop Jun 19 '09 at 22:31
Language is C#. I'll benchmark on Monday to see if I gained anything with the change. – Nick Jun 19 '09 at 22:44
@uncleo, yeah Java has >>> but C# doesn't. – Nosredna Jun 20 '09 at 02:00
2

After benchmarking, this has significant (4x) performance gain over the division. I used ANTS profiler. – Nick Jun 22 '09 at 12:53

Roee Adler · Answer 3 · 2009-06-19T23:30:34.580

7

You can use bit shifting and also overcome a possible overflow issue:

low + ((high-low) >> 1)

However I must admit I expect modern compilers and interpreters to do division by 2 (or division by any other constant power of 2) as bit-shifting, so not sure if it will really help - try it out.

edited Jun 19 '09 at 23:30

answered Jun 19 '09 at 22:40

Roee Adler

33,434
32
105
133

score 5 · Answer 4 · edited May 25 '13 at 13:18

To further expand on Nils' answer Richard Schroeppel invented this.

http://www.inwap.com/pdp10/hbaker/hakmem/boolean.html#item23

ITEM 23 (Schroeppel):

(A AND B) + (A OR B) = A + B = (A XOR B) + 2 (A AND B).

(A + B)/2 = ((A XOR B) + 2(A AND B))/2
          =  (A XOR B)/2  + (A AND B)
          =  (A XOR B)>>1 + (A AND B)


avg(x,y){return((x^y)>>1)+(x&y);}

(A AND B) + (A OR B) = A + B because A AND B gives the sum of the shared (between A and B) powers of two, A OR B gives both those shared and those that aren't, hence:

(A AND B) + (A OR B) = 
   (sum of shared powers of two) + 
   ((sum of shared powers of two) + (sum of unshared powers of two)) = 
     (sum of shared powers of two) + 
     ((sum of shared powers of two) + (sum of powers of two of A only) + 
     (sum of powers of two of B only)) = 
       ((sum of shared powers of two) + (sum of powers of two of A only)) + 
       ((sum of shared powers of two) + (sum of powers of two of B only)) 
= A + B.

A XOR B gives a map of those bits that differ between A and B. Hence,

A XOR B = (sum of powers of two of A only) + (sum of powers of two of B only).

And thus:

2(A AND B) + (A XOR B) = 
       ((sum of shared powers of two) + (sum of powers of two of A only)) + 
       ((sum of shared powers of two) + (sum of powers of two of B only)) 
= A + B.

That is the same one as Nils's one above, except you switched the terms. — wildplasser, May 06 '12 at 11:48

score 0 · Answer 5 · answered Jun 20 '09 at 01:16

0

If I recall correctly, there are some cases where using the exact middle of the array can actually be slower. The solution is to randomize the choice of the index where you bisect the array. Equally true of the algorithm for determining the median of an array.

I can't recall the precise details, but I remember hearing in lecture 6 of the MIT algorithms series on iTunes.

answered Jun 20 '09 at 01:16

duffymo

305,152
44
369
561

I'd worry about getting too tricky. You want to make sure that you hit them all at the end. Maybe just randomize if the distance between high and low is greater than a certain number. – Nosredna Jun 20 '09 at 01:20

score 0 · Answer 6 · answered Jul 15 '15 at 18:24

0

Try low + ((high - low) / 2)). This should work because you're only taking the average of two numbers. This would reduce the amount of time the algorithm is taking if the binary search list is fairly big, since high - low is much smaller than high + low.

answered Jul 15 '15 at 18:24

Entirely equivalent to [Roee Adler](http://stackoverflow.com/users/103532/roee-adler)'s answer from '09. – greybeard Jul 15 '15 at 19:36
@greybeard I came up with this myself, though. – Jul 19 '15 at 16:48

Fast average without division

6 Answers6

Linked