How do I optimize this Piece of code in C

Question

Yesterday,in an Interview I have been asked to test the 5th bit in a number(to test whether its on and off)and below is how I done it.

int number = 16;
int mask   = 1<<5;

if ((number & mask) == 0)
    printf("Bit is off");
else
    printf("its on");

but he then asked me to optimize this code and do it without using this particular mask.

So my questions what else mask I could have used in this code?

Your way is idiomatic. Doing it "without the mask" is just another stupid interview question. — Pascal Cuoq, Mar 19 '12 at 09:30
Yeh I can't really see how you're going to get better than that. To be honest in that car, a sane compiler will completely optimise out the `if` statement completely anyway as it can do the comparison at compile time. — mattjgalloway, Mar 19 '12 at 09:31
Hey guys have a look at this. http://stackoverflow.com/questions/7677256/and-faster-than-integer-modulo-operation — anurag-jain, Mar 19 '12 at 09:39
One minor point is that both `mask` should probably be `unsigned`. When bitshifting `signed` there are some rules regarding the MSB propagation. — valdo, Mar 19 '12 at 09:48
P.S. `if ((number & mask) == 0)` may be replaced by `if (number & mask)`. But I don't believe there will be any difference on any modern compiler — valdo, Mar 19 '12 at 09:49
Guys if I replace mask with pow(2,5), will it make any difference,where pow is power function will calculate power of 2 raised to 5 — Amit Singh Tomar, Mar 19 '12 at 10:30
@valdo - You mean `if (!(number & mask))`. The two statements you've shown are opposite. — mattjgalloway, Mar 19 '12 at 10:46
I honestly can't see how you could improve on this solution. — JeremyP, Mar 19 '12 at 12:12
Compile with different optimization switches, compare the runtimes and stick with the option(s) that give better results. — pmg, Mar 22 '12 at 13:22

gbulmer · Accepted Answer · 2012-03-20T14:51:39.143

Maybe the interviewer wanted to see how you reacted to a simple challenge. Or simply wanted to know if you really understood C, and would stand your ground? Maybe the interviewer wanted to know if you knew non-zero is true, and hence test your depth of understanding of C? Or maybe whether you could do binary to hex in your head?

IMHO interviews are about a lot more than code. They are expensive to do. I always tried to get a clear impression of the person, something hard to do by written communication, or even on the phone. After all, some of those folks are going to work with the recruit!

The most compact, and possibly the quickest is probably:

int number = 16;  // is this really what they gave?

printf((number & 0x20)?"its on":"Bit is off"); // did they mean 5th or bit 5?

Edit:

I've coded up the original approach, and my alternative, and compiled it for ARM Coretx-M3/4 (this is the processor I am writing for at the moment). It was compiled with -O3. Then I have disassembled each compiled file (using objdump) to get the assembler. I did it this way because the output of gcc -S was huge; that includes a lot of extra information for the assembler and linker, which made it harder to find the code.

I replaced printf with a dummy_printf to avoid #including stdio.h which added more noise. The dummy_printf isn't exactly the same as printf, it just takes one parameter, but it keeps the output short :-)

The source (all 7 files appended to make it easier to read) are at: http://pastebin.com/PTeApu9n

The resulting concatenated output of objdump (for each .o) is at: http://pastebin.com/kHAmakE3

As you can see, the original is compiled to:

void original_bit5(int number) {
    int mask = 1<<5;

    if ((number & mask) == 0)
   0:   f010 0f20   tst.w   r0, #32
   4:   d005        beq.n   1a <original_bit5+0x1a>
        dummy_printf("Bit is off");
    else
        dummy_printf("its on"); 
   6:   f240 0000   movw    r0, #0
   a:   f2c0 0000   movt    r0, #0
   e:   f7ff bffe   b.w 0 <dummy_printf>

void original_bit5(int number) {
    int mask = 1<<5;

    if ((number & mask) == 0)
        dummy_printf("Bit is off");
  12:   f240 0000   movw    r0, #0
  16:   f2c0 0000   movt    r0, #0
  1a:   f7ff bffe   b.w 0 <dummy_printf>
  1e:   bf00        nop

I think the call to dummy_printf is using tail-call chaining, i.e. dummy_printf will not return to this function. Very efficient!

There is no function entry code because the first four function parameters are passed in registers r0-r3.

You can't see the addresses of the two strings being loaded in r0. That is because this hasn't been linked.

You can see that:

int mask = 1<<5;    
if ((number & mask) == 0)

is compiled to:

   0:   f010 0f20   tst.w   r0, #32
   4:   d005        beq.n   1a <original_bit5+0x1a>

So 1<<5 and (... == 0), are compiler to a more direct and efficient sequence of instructions. There is a branch to the appropriate call of dummy_printf.

My code compiles to:

void my_bit5(int number) {
    dummy_printf((number & 0x20)?"its on":"Bit is off");    
   0:   f240 0200   movw    r2, #0
   4:   f240 0300   movw    r3, #0
   8:   f010 0f20   tst.w   r0, #32
   c:   f2c0 0200   movt    r2, #0
  10:   f2c0 0300   movt    r3, #0
  14:   bf14        ite ne
  16:   4610        movne   r0, r2
  18:   4618        moveq   r0, r3
  1a:   f7ff bffe   b.w 0 <dummy_printf>
  1e:   bf00        nop

This also seems to get tail-call optimised, i.e. there is no return from this function because there is no need of one, the return by dummy_printf will return directly to main()

What you can't see is the two registers, r2 and r2 will contain the addresses of the two strings. That is because this hasn't been linked.

As you can see there is a conditional execution instruction 'ite' which loads the parameter register r0 with either register r2 or register r3. So there is no branch in this code.

For a simple processor with pipelining, this can be quite efficient. On a simple pipelined processor, a branch can cause a 'pipeline 'stall' while parts of the pipeline are cleared out. This varies from processor to processor. So I assume gcc has got it right, and generated a better code sequence than executing a branch. I haven't checked.

Spurred on by Lundin, I have come up with this:

void union_bit5(int number) {
    union { int n; struct { unsigned :5; unsigned bit :1; }; } tester;
    tester.n = number;

    if (tester.bit)
        dummy_printf("Bit is on");
    else
        dummy_printf("its off");    
}

It does not explicitly include a mask, or bit shifting. It is almost certainly compiler dependent, you'd have to test it to ensure it works (glerk !-(

gcc for ARM generates the same code (bne vs beq, but that can be adjusted) as the OP's solution, so no optimisation, but it removes the mask:

void union_bit5(int number) {
    union { int n; struct { unsigned :5; unsigned bit :1; }; } tester;
    tester.n = number;

    if (tester.bit)
   0:   f010 0f20   tst.w   r0, #32
   4:   d105        bne.n   1a <union_bit5+0x1a>
        dummy_printf("Bit is on");
    else
        dummy_printf("its off");    
   6:   f240 0000   movw    r0, #0
   a:   f2c0 0000   movt    r0, #0
   e:   f7ff bffe   b.w 0 <dummy_printf>
void union_bit5(int number) {
    union { int n; struct { unsigned :5; unsigned bit :1; }; } tester;
    tester.n = number;

    if (tester.bit)
        dummy_printf("Bit is on");
  12:   f240 0000   movw    r0, #0
  16:   f2c0 0000   movt    r0, #0
  1a:   f7ff bffe   b.w 0 <dummy_printf>
  1e:   bf00        nop

For what it's worth:

(number & 0x20)? dummy_printf("its on") : dummy_printf("Bit is off");

gcc for ARM generates exactly the same code as the OP's. It generates branch, and not conditional instructions.

Summary:

The original code is compiled to a very efficient sequence of instructions
The ternary ...?...:... operator can compile to code which does not involve branches on the ARM Cortex-M3/4, but may also generate normal branch instructions.
It is difficult to write more efficient code than the original in this case :-)

I'll add, IMHO the cost of doing a printf is so enormous compared to a bit test, that worrying about optimising a bit test is too small an issue; it fails Amdahl's Law. An appropriate tactic for the bit test is to ensure -O3 or -Os is used.

If you wanted to to do something somewhat perverse (especially for such a trivial problem), but different, which might make the interviewer think, create a lookup table for every byte value. (I don't expect it to be faster ...)

#define BIT5(val) (((val)&0x20)?1:0)
const unsigned char bit5[256] = {
BIT5(0x00),BIT5(0x01), BIT5(0x02),BIT5(0x03), 
BIT5(0x04),BIT5(0x05), BIT5(0x06),BIT5(0x07),
// ... you get the idea ...
BIT5(0xF8),BIT5(0xF9), BIT5(0xFA),BIT5(0xFB), 
BIT5(0xFC),BIT5(0xFD), BIT5(0xFE),BIT5(0xFF)
};

//...
if (bit5[(unsigned char)number]) {
    printf("its on");
} else {
    printf("Bit is off");
}

This is a handy technique if there are some complex bit patterns in, for example, a peripheral register, which need converting to a decision, or switch. It is O(1) too

You could combine the two!-)

The original solution seems just as optimal and easier to read to me. For a start, the compiler will turn `1 << 5` to `0x20` and `1 << 5` gives a clearer understanding of what you are trying to do. — JeremyP, Mar 19 '12 at 12:10
@JeremyP - I agree `1<<5` will become `0x20`, and is clear (I might write `0b00100000`). The OP said "he then asked me to optimize this code and do it without using this particular mask". So I was responding to that requirement. For the table, I'd say "this is not appropriate here, but it solves complex bit combination-testing in general, with O(1) complexity". (I have interviewed many hundreds of 'techies' over the last 30 years, and IMHO, one goal is to favourably impress an interviewer, by responding well within the interview, and it is not only about the code) — gbulmer, Mar 19 '12 at 12:28
Both solutions still use the same mask. The second solution while useful for more complex bit manipulations could not be considered an optimisation in this particular circumstance. Instead of an and operation, you need to do an addition and an extra memory access. — JeremyP, Mar 19 '12 at 12:29
@JeremyP I agree, both solutions still use the same mask. The OP seemed to be asking for help responding to an interview question. Being able to respond to 'optimise that', and talk sensibly about pros and cons is the sort of skills I would look for if I were interviewing a development person (as I said, I have been doing that for about 30 years, so I think I have a relevant view). — gbulmer, Mar 19 '12 at 12:32
I don't think they asked how to optimize the printf() itself. I also fail to see how this is any faster than the OP's code. You have the same number of bit checks (1) and the same number of branches (1). The program size may be a tiny bit smaller with your code, but it isn't obvious, nor is that an important optimization to do. Also, the code for doing a simple one instruction bit check is almost certainly quicker than a lookup table check. And the lookup table consumes a whole lot of extra program space for nothing. — Lundin, Mar 19 '12 at 13:54
@Lundin - are you saying that the *same* number of instructions are executed for `printf((number & 0x20)?"its on":"Bit is off");` as the OP's code? What machine? Would you please post the assembler for the two? I SWAG on some processors, the load of the pointer to the string, before the single printf call, might be quicker, but I haven't checked. For example, ARM has conditional instructions, which could move either string pointer to the argument register (R0) (depending on the test) before a single, call of printf. There is less branching, so no pipeline stall. Would gcc do the same? — gbulmer, Mar 19 '12 at 14:24
@ Lundin "I don't think they asked how to optimize the printf() itself." I haven't attempted to change printf. It would be unchanged. I have provided an alternative way of testing the bit, and generating a call of printf with the appropriate value. — gbulmer, Mar 20 '12 at 12:11
@Lundin - you wrote "I also fail to see how this is any faster than the OP's code. You have the same number of bit checks (1) and the same number of branches (1)" I have posted code which shows that there are **no** branches in my example, and one on the OP's. HTH — gbulmer, Mar 20 '12 at 12:38
@gbulmer Alright, your code indeed loads the address of the string into a register and then call printf. This is what I meant with optimizing the printf. If you replace printf in the OP's code with a generic "`if(...) do_this(); else do_that();`" snippet then you couldn't optimize it like you have. I think the essence of the OP's question was how to optimize the bit masking in itself. — Lundin, Mar 20 '12 at 13:04
@Lundin - I agree all of `n & 0x20`, `n & (1<<5)` `(n>>5) & 1` cause (gcc for ARM) to generate the same code. I think the interview question was **not** purely about the technical solution. The fact that there are about 9 answers so far, and AFAICT none have satisfied you they are more optimal suggests that either the interview knew something _none_ of use have thought of (possible, so we keep trying to generate options), or the interviewer was interested in something other than the technical answer. I'm ignoring the idea that they may be wrong, as there isn't much we can do to help. — gbulmer, Mar 20 '12 at 13:35
Seriously laughing out loud about the prospect of this solution being used in an interview situation :) — JWDN, Apr 24 '13 at 09:15

k06a · Answer 2 · 2012-03-19T09:39:55.883

2

There are two ways to check bit:

if (number & (1 << bit)) { ... }
if ((number >> bit) & 1) { ... }

I think it will be interesting for you: http://graphics.stanford.edu/~seander/bithacks.html

edited Mar 19 '12 at 09:39

answered Mar 19 '12 at 09:30

k06a

17,755
10
70
110

How does this fulfil "without using this particular mask"? – bzlm Mar 19 '12 at 09:34
1

@bzlm because it doesn't use the mask of (1<<5); it's weasely, yes, but so is the question it's answering – tbert Mar 19 '12 at 10:07
This is identical to the OPs code and will yield the same machine code. – Lundin Mar 19 '12 at 13:23

score 1 · Answer 3 · edited May 23 '17 at 10:08

1

One more way is

1: Right shift the number 5 times so that 5th bit becomes 0th from the right(i.e LSB).
2: Now the logic is the numbers with LSB as 1 are odd, and the ones with 0 are even, Check that using %2

If you think the mod operations are much costlier than bit operation, I think it all depends on the compiler. Have a look at this thread

AND faster than integer modulo operation?.

I'm not sure why the interviewer would have asked you to optimize, may be he was expecting the modulus method as answer.

edited May 23 '17 at 10:08

Community

1
1

answered Mar 19 '12 at 09:34

anurag-jain

1,380
2
11
31

5

Using % can no way be considered an optimisation over bitmasking. In fact using bitmasking to determine even/oddness is considered (by some) an optimisation to using %. – freespace Mar 19 '12 at 09:35
http://stackoverflow.com/questions/7677256/and-faster-than-integer-modulo-operation – anurag-jain Mar 19 '12 at 09:41
Ironically, a compiler will try to replace all division and modulo operations with shift operations, to increase performance, because division and modulo are incredibly slow in comparison with shift, on any CPU I have ever heard of. – Lundin Mar 19 '12 at 13:49

score 1 · Answer 4 · answered Mar 19 '12 at 12:09

1

Are you sure that you should move it 5 bits? How about that:

int n = 16;
printf ("%d\n", (n >> 4) % 2);

answered Mar 19 '12 at 12:09

madper

806
1
10
25

This is slower than the OP's code. But if you are lucky, the compiler will optimize it back to the OP's code. – Lundin Mar 19 '12 at 13:57
creat a var `mask` will take more space. optimize is not only for time but also space... – madper Mar 23 '12 at 12:23
No it doesn't take more space! You don't understand how the compiler translates source into machine code! Read my answer to this question for an explanation of how a compiler works: "If the user didn't declare a [mask] variable, the value will be stored in an invisible temporary variable." No result of a calculation can be stored in thin air, or cyberspace, or in the programmer's mind. The computer must save it at a memory location somewhere. Whether you have given that memory location a name or not doesn't affect its allocation the slightest. – Lundin Mar 23 '12 at 12:38

James · Answer 5 · 2012-03-19T09:39:46.280

0

You could use the bit test assember instruction, but it's not impossible that the compiler will pick up on what you're doing and do that anyway.

Apart from that there isn't really anything to optimise, and certainly the only way to see if any of the minor variations on your method that are possible are faster would be to profile.

Here's the code that gcc 4.2.1 -O3 produces for if((number >> 5) & 1)):

0000000100000ee0    pushq   %rbp
0000000100000ee1    movq    %rsp,%rbp
0000000100000ee4    shrl    $0x05,%edi
0000000100000ee7    notl    %edi
0000000100000ee9    andl    $0x01,%edi
0000000100000eec    movl    %edi,%eax
0000000100000eee    leave
0000000100000eef    ret

and for if(number & (1 << 5)):

0000000100000ee0    pushq   %rbp
0000000100000ee1    movq    %rsp,%rbp
0000000100000ee4    shrl    $0x05,%edi
0000000100000ee7    notl    %edi
0000000100000ee9    andl    $0x01,%edi
0000000100000eec    movl    %edi,%eax
0000000100000eee    leave
0000000100000eef    ret

So we see that at least gcc 4.2.1 produces identical code in these cases, but it doesn't use the bit test instruction.

edited Mar 19 '12 at 09:39

answered Mar 19 '12 at 09:31

James

24,676
13
84
130

2

An optimising compiler will completely remove the whole `if` statement anyway and leave it with just a single `printf`, I would hope. – mattjgalloway Mar 19 '12 at 09:34
@mattjgalloway I had assumed that the printf was just there for exposition purposes... obviously the easiest way to optimise code is to remove printing! My examples above return 1 or 0 depending on the state of the bit. – James Mar 19 '12 at 09:40
I totally get different results. Mine ends up with just a single `call` to `printf`, pretty much (with the preamble to set the call up, obviously). I'm surprised you're seeing it even do the shifts at `O3`. – mattjgalloway Mar 19 '12 at 09:54
Note that in the question, `number` and `mask` are known at compile time. Hence my comment. – mattjgalloway Mar 19 '12 at 09:55
It think you need both lots of luck and knowledge to write optimized assembler better than the compiler. Because the compiler knows which CPU resources that are currently used when the code is executed, you do not. Also, all decent compilers will translate the original expression to a bit test, so I think writing this as inline assembler is superfluous. And possibly slower. – Lundin Mar 19 '12 at 13:42

score 0 · Answer 6 · answered Mar 19 '12 at 11:36

0

(number & 16)?printf("yes"):printf("no");

answered Mar 19 '12 at 11:36

user1232138

5,451
8
37
64

This is identical to the OP's code and will result in the same machine code. – Lundin Mar 19 '12 at 13:57
It is not identical to OP´s code. This code tests the fifth bit, while the OP:s code tests the sixth bit. This code does not use any bit shift operation, while the OP:s code does. This code does not perform any comparison operation, while the OP:s code does. (But yeah, sure. Compiler optimizations will most likely remove the superflous operations in the OP:s code) – Alderath Mar 19 '12 at 14:24
1

@Alderath 4th or 5th was a typo, who cares, it is not what the essence of the question is about. The _machine code_ of the OP's code is identical to this code. The _machine code_ of the OP's code does not contain any shift. This code _does_ perform a comparison, that's what the ?: operator is there for. If you want to know the details, I explained them in one of the answers. – Lundin Mar 19 '12 at 14:49

score 0 · Answer 7 · edited Mar 19 '12 at 23:03

0

A c learner newbie's try

int number = 16;
if(16 == number&(0x10))
    puts("true");
else
    puts("false");

edited Mar 19 '12 at 23:03

DaveShaw

52,123
16
112
141

answered Mar 19 '12 at 13:16

Latyas

35
6

This code assumes that "number" is a constant, but is otherwise identical to the OP's code, and will result in the same machine code. – Lundin Mar 19 '12 at 13:58

SquareRootOfTwentyThree · Answer 8 · 2012-03-19T13:29:57.540

0

Everybody is shifting to the right. I want to be original and shift to the left:

#define INDEX 5

int number = 16;

if (number<<(sizeof(number)*8-INDEX-1)<0)

  printf("Bit #%d is set in %d.\n", INDEX, number);
else    
  printf("Bit #%d is NOT set in %d.\n", INDEX, number);

This code is ugly and absolutely implementation dependent (the C standard says that the result is undefined). On x86 it works and it is somewhat more efficient because the MSB is always copied in the bit #7 ("sign") of the flags register, which can be tested with a single jns instruction.

In other words, for INDEX 5, you have:

[...]
shl $0x1F, %eax
test %eax, %eax
jns 8053635
[...]

The original solution of the OP is cleaner, and that's how production code should be like.

edited Mar 19 '12 at 13:29

answered Mar 19 '12 at 13:21

SquareRootOfTwentyThree

7,606
32
43

This makes a whole lot of assumptions of the hardware. You are assuming that the OP is using x86, which is inefficient at certain instructions, and therefore you must do this hack. Another CPU could be capable of doing the bit test with a single instruction right away. – Lundin Mar 19 '12 at 14:06
You are right. Certain CPUs may even throw an exception at you if you dare to do that. However, this is an interview question: it could be used to show that a) you know complement-2 representation b) the architecture of x86 c) the C standard d) you know what optimization is and you can argue about the right time to do it. – SquareRootOfTwentyThree Mar 19 '12 at 14:13
You could easily convert this into standard-compliant C by shifting the value as a *unsigned* value, then casting it to *signed* before checking if it is negative or positive. On most architectures this would compile to a *shift* and a *conditional branch*. – Lindydancer Mar 20 '12 at 16:07

score 0 · Answer 9 · answered Mar 19 '12 at 13:40

Any attempt to optimize that code falls under the category "premature optimization". If you understand how the compiler translates C to machine code, you would not attempt to optimize that code. I am guessing the interviewer lacked such knowledge.

If we dissect that code, this is what we get:

1<<5 is translated to the literal 32 at compile time. There is absolutely no difference in performance between writing int mask = 1<<5; and int mask = 32;, but the latter is far harder to understand.

Further,

if ((number & mask) == 0) is completely equivalent to
if ((number & 32) == 0) is completely equivalent to
if ((number & (1<<5)) == 0)

There exists two cases:

Either the compiler needs to find a memory location to store the mask.
- If the user declared a variable mask, the value will be stored there.
- If the user didn't declare a variable, the value will be stored in an invisible temporary variable.
- RAM consumption in the two above cases is completely equivalent.
Or the compiler does not need to store the mask anywhere. It will optimize away the whole mask variable or numeric literal and bake them in with the rest of the program instruction.

Which of these two that will be picked depends on whether int number = 16; is modified or not from the point of declaration to the if statement where the masking takes place.

And that's it. Any attempt to write the code differently than in your original example is premature optimization and obfuscation and will not result in any performance difference.

If that piece of code was part of the inner loop in a performance intensive program, re-arranging it could lead to some speed up, so I disagree: this optimization is not necessarily "premature" and it may result in performance difference (probably in way we don't expect, e.g. if we don't know the CPU architecture). — SquareRootOfTwentyThree, Mar 19 '12 at 15:47
@SquareRootOfTwentyThree I really don't see how it would make any difference if the op's code was inside a loop or not. — Lundin, Mar 19 '12 at 19:51

score 0 · Answer 10 · answered Mar 19 '12 at 16:47

Forgive the following answer:

I used to work at a start-up, when the company decided not to pursue a candidate, they come up with a bogus reason to end the interview. Perhaps this was the poster's experience.

asking for kth bit may mean the least significant bit is the zeroth bit so that (number & 1 << 5) would not do. But that was not the issue. He asked for optimization. Sometime the reason you fail an interview has nothing to do with you. In that case it's their loss; there will always be another interview opportunity.

score 0 · Answer 11 · answered Mar 22 '12 at 13:12

In one of the interviews, i gave the following answer, and he was satisfied, but a little change to the question was 'check if nth bit is set.

int N = 16;
printf ("%d\n", (N >> (n-1)) % 2);

So, when making the answer generic, Not exactly sure which one (of below) is faster for this example.

1<<(n-1) & N (or)
N>>(n-1) % 2 (or)
N>>(n-1) & 1

How do I optimize this Piece of code in C

11 Answers11