3

I have an integer number n and I need to rounds n/4 upward. For performance reason I need to find out a fast way in C. The division by 4 can be done using a >> 2 shift operation but I don't know about the round. I could use ceil but I am worried about performance.

Marco
  • 783
  • 5
  • 21
  • 6
    Before worrying about performance, *measure!* See if what you think is a bottleneck actually is. Don't do manual optimizations before you know if it's really needed (manually optimized code is often hard to read, understand and maintain). – Some programmer dude Feb 03 '15 at 17:43
  • The general formula for divide-and-ceiling is `#define DIV_CEIL(n,d) (((n)-1)/(d)+1)`. – barak manos Feb 03 '15 at 17:47
  • @barakmanos: Wouldn't that break for `unsigned` where `n==0`? – Tim Čas Feb 03 '15 at 17:52
  • @TimČas it's broken for all negative numbers, as division (from C99 onwards) truncates towards zero. – abligh Feb 03 '15 at 17:53
  • @abligh: Quite possibly (depending on what one means by "upward"). I didn't bother analyzing it that much, I've just spotted (and pointed out) the obvious. Point is, it's broken for at least one important case. – Tim Čas Feb 03 '15 at 17:55

2 Answers2

8

If your operand is non-negative, how about:

unsigned int
roundupdiv4 (unsigned int n)
{
    return (n+3)>>2;
}

Note that any sensible compiler will compile /4 of an unsigned int as >>2 anyway.

I can confirm that by compiling the above with gcc -O3 -S:

    .file   "x.c"
    .text
    .p2align 4,,15
    .globl  roundupdiv4
    .type   roundupdiv4, @function
roundupdiv4:
.LFB0:
    .cfi_startproc
    leal    3(%rdi), %eax
    shrl    $2, %eax
    ret
    .cfi_endproc
.LFE0:
    .size   roundupdiv4, .-roundupdiv4
    .ident  "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
    .section    .note.GNU-stack,"",@progbits

and noting the output is exactly the same if I substitute >>2 by /4.

Also note that I've used unsigned int as >> is implementation defined for negative signed left operands (i.e. shifting negative values right). If you want a working one that rounds up (strictly up) for signed values:

int
roundupdiv4 (int n)
{
    return ((n>0)?(n+3):n)/4;
}

because integer division uses truncation rounding, i.e. rounds up for negative numbers (towards zero) anyway. (That's for C99 onwards; it's implementation defined in C89).

If by round up you meant 'round away from zero' then:

int
roundawayfromzerodiv4 (int n)
{
    return ((n>0)?(n+3):(n-3))/4;
}
Community
  • 1
  • 1
abligh
  • 24,573
  • 4
  • 47
  • 84
  • Minor correction: `>>` is implementation-defined for negative left operands (and usually sign-extends); `<<` is undefined (even `(0,-1) << 0` is). – mafso Feb 03 '15 at 19:10
  • @mafso: thanks - I think I've fixed that w.r.t. `>>`. `<<` is not used here but for my interest, ISO/IEC 9899:2011 s6.5.7 §4 says *"The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros ... [text if E1 is unsigned] ... If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined."* - implies to me you can safely left-shift 0! – abligh Feb 03 '15 at 19:27
  • I'd be rather surprised if `/4` were converted to `>>2` by any mainstream compiler, at least for `int`: they're different for negative operands. C99 requires /4 to truncate the fractional part (so `(-1) / 4` should be `0`), while (as already noted), the behaviour of `>>` for negative operands is implementation defined, but I'd expect most implementations to define `>>2` so that it shifts the bits (either sign-extending or not). With or without sign-extension, `(-1) >> 2` is not going to give `0`. – Mark Dickinson Feb 03 '15 at 19:31
  • @MarkDickinson - re conversion, arrgh, I meant `>>2` of an `unsigned int` - fixed. Re `(-1)>>2 != 0` let's hope so, but in theory a C compiler could legally do this. – abligh Feb 03 '15 at 19:38
  • @MarkDickinson - Then be surprised. See disassembled functions in my answer. – Brian McFarland Feb 03 '15 at 19:38
  • @BrianMcFarland: Ah, nice. Thank you. Just tried this for myself, under Clang with -O3, and a simple `/4` on (signed) int turns into a combination of two arithmetic shifts, one bit shift, and an addition. – Mark Dickinson Feb 03 '15 at 19:41
  • @MarkDickinson be more surprised - I've just done it and it turns out with exactly the same code for `unsigned int`s. (`gcc-4.8.real (Ubuntu 4.8.2-19ubuntu1) 4.8.2`) – abligh Feb 03 '15 at 19:45
  • @MarkDickinson - Weird. I thought Clang was general on par with or better than gcc for performance? FYIY, it was gcc 4.8.2 from the ubuntu repo, so not exactly bleeding edge. – Brian McFarland Feb 03 '15 at 19:46
  • @abligh: wrt left-shifting by 0 [edit: left-shifting _negative_ values by 0]: I read that as _(If `E1` has a signed type **and** nonnegative value [...]); otherwise, the behavior is undefined._ -- so I think, it's UB. (I've read this somewhere here on SO, don't remember where, but I think, this interpretation is correct.) – mafso Feb 03 '15 at 19:58
  • @mafso, but 0 *is* non-negative and (by assumption) a signed type. – abligh Feb 03 '15 at 19:59
  • @abligh, `E1` is the left operand, not the right one, `0 << 0` is defined, of course. – mafso Feb 03 '15 at 20:00
  • @mafso yes indeed. There is no prohibition on E2 being zero, or on E2 at all, save that E1 x 2^E2 is representable in the result type, which it clearly always will be when left-shifting by zero bits. Am I missing something? – abligh Feb 03 '15 at 20:02
  • @abligh No, I don't think you miss anything, `E2` can always be 0 (if I don't miss anything). And I'm not sure if we're talking about the same thing :) My point was just that even `(0,-1) << 0` is UB (where the left operand is negative, not the right one). – mafso Feb 03 '15 at 20:06
  • @mafso ah apologies, it's notation confusion. I thought by `(0,-1)<<0` you meant `0<<0` *and* `-1<<0`. – abligh Feb 03 '15 at 20:11
  • @abligh: Sorry, I should've commented on that... I just used the comma operator to not make it a constant expression and avoid the (even more off-topic) discussion if an implementation may refuse to compile (and must diagnose it), because a constraint on constants is violated (Is "undefined behavior" _in the range of representable values for its type_?). – mafso Feb 03 '15 at 20:17
  • @mafso yes, eventually it dawned on me, having considered how that couldn't be an interval either :-) – abligh Feb 03 '15 at 20:19
3

This optimizes into 7 instructions including the return on my x86_64 machine with gcc -O3.

int div4roundup(int x)
{
   return (x & 3) ? (x >> 2) + 1 : (x>>2);
}

disassembly:

int div4roundup(int x)
{
  return (x & 3) ? (x >> 2) + 1 : (x>>2);
   0:   89 f8                   mov    %edi,%eax
   2:   31 d2                   xor    %edx,%edx
   4:   c1 f8 02                sar    $0x2,%eax
   7:   83 e7 03                and    $0x3,%edi
   a:   0f 95 c2                setne  %dl
   d:   01 d0                   add    %edx,%eax
}
   f:   c3                      retq   

Compared with abligh's equivalent solution:

int
roundupdiv4 (int n)
{
    return ((n>0)?(n+3):n)/4;
   0:   85 ff                   test   %edi,%edi
   2:   8d 47 03                lea    0x3(%rdi),%eax
   5:   7f 05                   jg     c <fc+0xc>
   7:   85 ff                   test   %edi,%edi
   9:   0f 49 c7                cmovns %edi,%eax
   c:   c1 f8 02                sar    $0x2,%eax
}
   f:   c3                      retq   

Compared with abligh's alternate solution:

0000000000000000 <roundawayfromzerodiv4>:
int
roundawayfromzerodiv4 (int n)
{
    return ((n>0)?(n+3):(n-3))/4;
   0:   85 ff                   test   %edi,%edi
   2:   7e 0c                   jle    10 <roundawayfromzerodiv4+0x10>
   4:   8d 47 03                lea    0x3(%rdi),%eax
   7:   c1 f8 02                sar    $0x2,%eax
   a:   c3                      retq   
   b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  10:   89 f8                   mov    %edi,%eax
  12:   83 e8 03                sub    $0x3,%eax
  15:   0f 48 c7                cmovs  %edi,%eax
  18:   c1 f8 02                sar    $0x2,%eax
}
  1b:   c3                      retq   

EDIT: Thought I came up with something faster than the other answer then realized I was comparing two slightly different computations. Our two "rounds strictly up" functions are equal in instruction count, but slightly different. Haven't analyzed disassembly enough to know which is faster.

Brian McFarland
  • 9,052
  • 6
  • 38
  • 56