tl;dr: ((n + 9) / 10) * 10
compiles to the nicest (fastest) asm code in more cases, and is easy to read and understand for people that know what integer division does in C. It's a fairly common idiom.
I haven't investigated what the best option is for something that needs to work with negative n
, since you might want to round away from zero, instead of still towards +Infinity, depending on the application.
Looking at the C operations used by the different suggestions, the most light-weight is Mark Dickinson's (in comments):
(n+9) - ((n+9)%10)
It looks more efficient than the straightforward divide / multiply suggested by a couple people (including @bta): ((n + 9) / 10) * 10
, because it just has an add instead of a multiply. (n+9
is a common subexpression that only has to be computed once.)
It turns out that both compile to literally identical code, using the compiler trick of converting division by a constant into a multiply and shift, see this Q&A for how it works. Unlike a hardware div
instruction that costs the same whether you use the quotient, remainder, or both results, the mul/shift method takes extra steps to get the remainder. So the compiler see that it can get the same result from a cheaper calculation, and ends up compiling both functions to the same code.
This is true on x86, ppc, and ARM, and all the other architectures I've looked at on the Godbolt compiler explorer. In the first version of this answer, I saw an sdiv
for the %10
on Godbolt's gcc4.8 for ARM64, but it's no longer installed (perhaps because it was misconfigured?) ARM64 gcc5.4 doesn't do that.
Godbolt has MSVC (CL) installed now, and some of these functions compile differently, but I haven't taken the time to see which compile better.
Note that in the gcc output for x86, multiply by 10 is done cheaply with lea eax, [rdx + rdx*4]
to do n*5, then add eax,eax
to double that. imul eax, edx, 10
would have 1 cycle higher latency on Intel Haswell, but be shorter (one less uop). gcc / clang don't use it even with -Os -mtune=haswell
:/
The accepted answer (n + 10 - n % 10
) is even cheaper to compute: n+10
can happen in parallel with n%10
, so the dependency chain is one step shorter. It compiles to one fewer instruction.
However, it gives the wrong answer for multiples of 10: e.g. 10 -> 20
. The suggested fix uses an if(n%10)
to decide whether to do anything. This compiles into a cmov
, so it's longer and worse than @Bta's code. If you're going to use a conditional, do it to get sane results for negative inputs.
Here's how all the suggested answers behave, including for negative inputs:
./a.out | awk -v fmt='\t%4s' '{ for(i=1;i<=NF;i++){ a[i]=a[i] sprintf(fmt, $i); } } END { for (i in a) print a[i]; }'
i -22 -21 -20 -19 -18 -12 -11 -10 -9 -8 -2 -1 0 1 2 8 9 10 11 12 18 19 20 21 22
mark -10 -10 -10 -10 0 0 0 0 0 0 0 0 0 10 10 10 10 10 20 20 20 20 20 30 30
igna -10 -10 -10 0 0 0 0 0 10 10 10 10 10 10 10 10 10 20 20 20 20 20 30 30 30
utaal -20 -20 -20 -10 -10 -10 -10 -10 0 0 0 0 0 10 10 10 10 10 20 20 20 20 20 30 30
bta -10 -10 -10 -10 0 0 0 0 0 10 10 10 10 10 10 10 10 10 20 20 20 20 20 30 30
klatchko -10 -10 -10 -10 0 0 0 0 0 0 0 0 0 10 10 10 10 10 20 20 20 20 20 30 30
branch -10 -10 -20 0 0 0 0 -10 10 10 10 10 0 10 10 10 10 10 20 20 20 20 20 30 30
(transpose awk program)
Ignacio's n + (((9 - (n % 10)) + 1) % 10)
works "correctly" for negative integers, rounding towards +Infinity, but is much more expensive to compute. It requires two modulo operations, so it's essentially twice as expensive. It compiles to about twice as many x86 instructions, doing about twice the work of the other expressions.
Result-printing program (same as the godbolt links above)
#include <stdio.h>
#include <stdlib.h>
int f_mark(int n) { return (n+9) - ((n+9)%10); } // good
int f_bta(int n) { return ((n + 9) / 10) * 10; } // compiles to literally identical code
int f_klatchko(int n) { return n + 10 - n % 10; } // wrong, needs a branch to avoid changing multiples of 10
int f_ignacio(int n) { return n + (((9 - (n % 10)) + 1) % 10); } // slow, but works for negative
int roundup10_utaal(int n) { return ((n - 1) / 10 + 1) * 10; }
int f_branch(int n) { if (n % 10) n += (10 - n % 10); return n; } // gcc uses cmov after f_accepted code
int main(int argc, char**argv)
{
puts("i\tmark\tigna\tutaal\tbta\tklatch\tbranch");
for (int i=-25 ; i<25 ; i++)
if (abs(i%10) <= 2 || 10 - abs(i%10) <= 2) // only sample near interesting points
printf("%d\t%d\t%d\t%d\t%d\t%d\t%d\n", i, f_mark(i), f_accepted(i),
f_ignacio(i), roundup10_utaal(i), f_bta(i), f_branch(i));
}