I have an integer number n and I need to rounds n/4 upward. For performance reason I need to find out a fast way in C. The division by 4 can be done using a >> 2 shift operation but I don't know about the round. I could use ceil but I am worried about performance.
-
6Before worrying about performance, *measure!* See if what you think is a bottleneck actually is. Don't do manual optimizations before you know if it's really needed (manually optimized code is often hard to read, understand and maintain). – Some programmer dude Feb 03 '15 at 17:43
-
The general formula for divide-and-ceiling is `#define DIV_CEIL(n,d) (((n)-1)/(d)+1)`. – barak manos Feb 03 '15 at 17:47
-
@barakmanos: Wouldn't that break for `unsigned` where `n==0`? – Tim Čas Feb 03 '15 at 17:52
-
@TimČas it's broken for all negative numbers, as division (from C99 onwards) truncates towards zero. – abligh Feb 03 '15 at 17:53
-
@abligh: Quite possibly (depending on what one means by "upward"). I didn't bother analyzing it that much, I've just spotted (and pointed out) the obvious. Point is, it's broken for at least one important case. – Tim Čas Feb 03 '15 at 17:55
2 Answers
If your operand is non-negative, how about:
unsigned int
roundupdiv4 (unsigned int n)
{
return (n+3)>>2;
}
Note that any sensible compiler will compile /4
of an unsigned int
as >>2
anyway.
I can confirm that by compiling the above with gcc -O3 -S
:
.file "x.c"
.text
.p2align 4,,15
.globl roundupdiv4
.type roundupdiv4, @function
roundupdiv4:
.LFB0:
.cfi_startproc
leal 3(%rdi), %eax
shrl $2, %eax
ret
.cfi_endproc
.LFE0:
.size roundupdiv4, .-roundupdiv4
.ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
.section .note.GNU-stack,"",@progbits
and noting the output is exactly the same if I substitute >>2
by /4
.
Also note that I've used unsigned int
as >>
is implementation defined for negative signed left operands (i.e. shifting negative values right). If you want a working one that rounds up (strictly up) for signed values:
int
roundupdiv4 (int n)
{
return ((n>0)?(n+3):n)/4;
}
because integer division uses truncation rounding, i.e. rounds up for negative numbers (towards zero) anyway. (That's for C99 onwards; it's implementation defined in C89).
If by round up you meant 'round away from zero' then:
int
roundawayfromzerodiv4 (int n)
{
return ((n>0)?(n+3):(n-3))/4;
}
-
Minor correction: `>>` is implementation-defined for negative left operands (and usually sign-extends); `<<` is undefined (even `(0,-1) << 0` is). – mafso Feb 03 '15 at 19:10
-
@mafso: thanks - I think I've fixed that w.r.t. `>>`. `<<` is not used here but for my interest, ISO/IEC 9899:2011 s6.5.7 §4 says *"The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros ... [text if E1 is unsigned] ... If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined."* - implies to me you can safely left-shift 0! – abligh Feb 03 '15 at 19:27
-
I'd be rather surprised if `/4` were converted to `>>2` by any mainstream compiler, at least for `int`: they're different for negative operands. C99 requires /4 to truncate the fractional part (so `(-1) / 4` should be `0`), while (as already noted), the behaviour of `>>` for negative operands is implementation defined, but I'd expect most implementations to define `>>2` so that it shifts the bits (either sign-extending or not). With or without sign-extension, `(-1) >> 2` is not going to give `0`. – Mark Dickinson Feb 03 '15 at 19:31
-
@MarkDickinson - re conversion, arrgh, I meant `>>2` of an `unsigned int` - fixed. Re `(-1)>>2 != 0` let's hope so, but in theory a C compiler could legally do this. – abligh Feb 03 '15 at 19:38
-
@MarkDickinson - Then be surprised. See disassembled functions in my answer. – Brian McFarland Feb 03 '15 at 19:38
-
@BrianMcFarland: Ah, nice. Thank you. Just tried this for myself, under Clang with -O3, and a simple `/4` on (signed) int turns into a combination of two arithmetic shifts, one bit shift, and an addition. – Mark Dickinson Feb 03 '15 at 19:41
-
@MarkDickinson be more surprised - I've just done it and it turns out with exactly the same code for `unsigned int`s. (`gcc-4.8.real (Ubuntu 4.8.2-19ubuntu1) 4.8.2`) – abligh Feb 03 '15 at 19:45
-
@MarkDickinson - Weird. I thought Clang was general on par with or better than gcc for performance? FYIY, it was gcc 4.8.2 from the ubuntu repo, so not exactly bleeding edge. – Brian McFarland Feb 03 '15 at 19:46
-
@abligh: wrt left-shifting by 0 [edit: left-shifting _negative_ values by 0]: I read that as _(If `E1` has a signed type **and** nonnegative value [...]); otherwise, the behavior is undefined._ -- so I think, it's UB. (I've read this somewhere here on SO, don't remember where, but I think, this interpretation is correct.) – mafso Feb 03 '15 at 19:58
-
-
@abligh, `E1` is the left operand, not the right one, `0 << 0` is defined, of course. – mafso Feb 03 '15 at 20:00
-
@mafso yes indeed. There is no prohibition on E2 being zero, or on E2 at all, save that E1 x 2^E2 is representable in the result type, which it clearly always will be when left-shifting by zero bits. Am I missing something? – abligh Feb 03 '15 at 20:02
-
@abligh No, I don't think you miss anything, `E2` can always be 0 (if I don't miss anything). And I'm not sure if we're talking about the same thing :) My point was just that even `(0,-1) << 0` is UB (where the left operand is negative, not the right one). – mafso Feb 03 '15 at 20:06
-
@mafso ah apologies, it's notation confusion. I thought by `(0,-1)<<0` you meant `0<<0` *and* `-1<<0`. – abligh Feb 03 '15 at 20:11
-
@abligh: Sorry, I should've commented on that... I just used the comma operator to not make it a constant expression and avoid the (even more off-topic) discussion if an implementation may refuse to compile (and must diagnose it), because a constraint on constants is violated (Is "undefined behavior" _in the range of representable values for its type_?). – mafso Feb 03 '15 at 20:17
-
@mafso yes, eventually it dawned on me, having considered how that couldn't be an interval either :-) – abligh Feb 03 '15 at 20:19
This optimizes into 7 instructions including the return on my x86_64 machine with gcc -O3
.
int div4roundup(int x)
{
return (x & 3) ? (x >> 2) + 1 : (x>>2);
}
disassembly:
int div4roundup(int x)
{
return (x & 3) ? (x >> 2) + 1 : (x>>2);
0: 89 f8 mov %edi,%eax
2: 31 d2 xor %edx,%edx
4: c1 f8 02 sar $0x2,%eax
7: 83 e7 03 and $0x3,%edi
a: 0f 95 c2 setne %dl
d: 01 d0 add %edx,%eax
}
f: c3 retq
Compared with abligh's equivalent solution:
int
roundupdiv4 (int n)
{
return ((n>0)?(n+3):n)/4;
0: 85 ff test %edi,%edi
2: 8d 47 03 lea 0x3(%rdi),%eax
5: 7f 05 jg c <fc+0xc>
7: 85 ff test %edi,%edi
9: 0f 49 c7 cmovns %edi,%eax
c: c1 f8 02 sar $0x2,%eax
}
f: c3 retq
Compared with abligh's alternate solution:
0000000000000000 <roundawayfromzerodiv4>:
int
roundawayfromzerodiv4 (int n)
{
return ((n>0)?(n+3):(n-3))/4;
0: 85 ff test %edi,%edi
2: 7e 0c jle 10 <roundawayfromzerodiv4+0x10>
4: 8d 47 03 lea 0x3(%rdi),%eax
7: c1 f8 02 sar $0x2,%eax
a: c3 retq
b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
10: 89 f8 mov %edi,%eax
12: 83 e8 03 sub $0x3,%eax
15: 0f 48 c7 cmovs %edi,%eax
18: c1 f8 02 sar $0x2,%eax
}
1b: c3 retq
EDIT: Thought I came up with something faster than the other answer then realized I was comparing two slightly different computations. Our two "rounds strictly up" functions are equal in instruction count, but slightly different. Haven't analyzed disassembly enough to know which is faster.

- 9,052
- 6
- 38
- 56