31

I'm in a need to optimize this really tiny, but pesky function.

unsigned umod(int a, unsigned b)
{
    while(a < 0)
        a += b;

    return a % b;
}

Before you cry out "You don't need to optimize it", please keep in mind that this function is called 50% of the entire program lifetime, as it's being called 21495808 times for the smallest test case benchmark.

The function is already being inlined by the compiler so please don't offer to add the inline keyword.

LiraNuna
  • 64,916
  • 15
  • 117
  • 140
  • Does simply `a % b` not work? – Anon. Feb 24 '10 at 03:37
  • @Anon: because the answer must be positive but if a is negative, a % b is negative too. – Jonathan Leffler Feb 24 '10 at 03:42
  • If your program calls the function for a lot of related values, you may consider optimizing the calculation so that no division needs to performed at all. – Tronic Feb 24 '10 at 03:43
  • In the future, this site might come in handy: http://refactormycode.com/ - it was made just for these kinds of problems. – Ponkadoodle Feb 24 '10 at 04:23
  • check whether this helps - http://stackoverflow.com/questions/48053/is-there-an-alternative-to-using-modulus-in-c-c – Narendra N Feb 24 '10 at 04:57
  • If you want an actual solution instead of lots of people guessing at one, you need to provide more information: what processor[s] are are you targeting? what is the distribution of inputs `a` and `b`? – Stephen Canon Feb 24 '10 at 05:14
  • "Portable" and "Any value" should suffice. – LiraNuna Feb 24 '10 at 05:35
  • If your answers are "portable" and "any value", then you don't have a defined standard to measure the performance of implementations, and you can't make a meaningful statement about which is faster. – Stephen Canon Feb 24 '10 at 06:24
  • @Stephen: I know, and that's what bugs me... – LiraNuna Feb 24 '10 at 06:53
  • 11 answers already, many smart but initially flawed a bit. I wonder, will the things go better if the answer was updated with some (well, kind of) unit tests for both edge, general cases and (already) known subtleties? – mlvljr Feb 24 '10 at 09:52
  • _Also_, what are constraints on `a` and `b`, what if `b==0`, for example? – mlvljr Feb 24 '10 at 10:00
  • 1
    I can guarantee `b` is always positive (non-zero), and it can (and does) reach up to UINT_MAX (so signed cast is unwanted). – LiraNuna Feb 24 '10 at 10:51

12 Answers12

14

This avoids looping:

int tmp = a % b;
if (tmp < 0) tmp += b;

Notice that both a and b need to be signed.

Tronic
  • 10,250
  • 2
  • 41
  • 53
  • 3
    `int tmp = a % b; return tmp + b * (tmp < 0);` Is faster. – LiraNuna Feb 24 '10 at 03:48
  • 1
    There should be no difference when you use an optimizing compiler (but sometimes compilers miss things). – Tronic Feb 24 '10 at 03:49
  • I'm testing this live with GCC 4.4.1 with -O3, there IS a difference. Don't give compilers too much credit, else I won't be here asking this question :) – LiraNuna Feb 24 '10 at 03:50
  • @LiraNura: What processor? Can you give an assembly dump? – Anon. Feb 24 '10 at 03:54
  • you might even try `return a % b + b * (a < 0);` since b is guaranteed to be unsigned (untested and unprofiled) – cobbal Feb 24 '10 at 03:59
  • 3
    This doesn't give the same result as the OP's original. Eg with `a = -10` and `b = 3`, the original gives 2 but this gives 0. – caf Feb 24 '10 at 04:04
  • (-10) % 3 gives -1, add 3 and you have 2. How did you get 0 instead? – Tronic Feb 24 '10 at 04:08
  • 1
    Using multiplication might be slower than a (single!) conditional statement. It depends on the processor. I've been developing for the arm7TDMI lately - conditional statements (Which can be done without branching) use 1 extra clock cycle each, where as multiplication uses 3. – Ponkadoodle Feb 24 '10 at 04:10
  • 2
    `(-10) % 3U` does not give -1. – caf Feb 24 '10 at 04:10
  • Good point, you obviously need to use signed modulus there to avoid wrapping around. Added a note about that. – Tronic Feb 24 '10 at 04:13
  • 2
    updated version `umod(-10, 3) == 2`: `return a % (int)b + b * (a < 0);` although this will lose half of the range of b. – cobbal Feb 24 '10 at 04:16
  • 2
    +1 for mentioning that this will lose half range of b, which will defy the purpose of declaring b _unsigned_. – legends2k Feb 24 '10 at 06:00
  • Some anonymous coward downvoting without comment :( – Tronic Feb 25 '10 at 07:38
10

This should do it:

unsigned umod(int a, unsigned b)
{
    if (a < 0)
    {
        unsigned r = (-a % b);
        if (r)
            return b - r;
        else
            return 0;
    }
    else
        return a % b;
}

Tested to match original. Limitation is that a > INT_MIN on 2s complement machines.

caf
  • 233,326
  • 40
  • 323
  • 462
  • 1
    @wallacoloo: He changed it from `>= INT_MIN` to `> INT_MIN`. Now it's correct. – Alok Singhal Feb 24 '10 at 04:22
  • Dammit, missed an edge case - updated to fix but it's nowhere near as clean anymore. – caf Feb 24 '10 at 04:29
  • That's pretty weird - since for the `a >= 0` case it should be executing exactly the same instructions at least. How are you getting gprof to profile the inline functions, btw? – caf Feb 24 '10 at 05:02
  • 1
    FWIW, in my testing (gcc 4.3.2, -O3) my version runs about 6 to 7 times faster than the loop version, with an even spread of `a` values between -1073741823 and 1073741824. With the `a` values all positive, it runs the same. – caf Feb 24 '10 at 05:08
  • In my case, it's okay for `0 == b`, so I have changed this to be `return b - (-a % b);` for one less branch. – LiraNuna Feb 26 '10 at 00:09
7

Using the ~ :)

unsigned umod(int a, unsigned b)
{
    if (a<0) return b-1-~a%b;
    return a%b;
}

The % has higher precedence than -

If it's ok to return b instead of 0 when -a is a multiple of b you can save some ops

unsigned umod(int a, unsigned b)
{
    if (a<0) return b - (-a % b);
    return a%b;
}

slightly golfed version :)

unsigned umod(int a, unsigned b)
{
return(a<0)?b-(-a%b):a%b;
}

Here is the resulting assembly

1    .globl umod3
2       .type   umod3, @function
3    umod3:
4    .LFB3:
5       .cfi_startproc
6       testl   %edi, %edi
7       js      .L18
8       movl    %edi, %eax
9       xorl    %edx, %edx
10      divl    %esi
11      movl    %edx, %eax
12      ret
13      .p2align 4,,10
14      .p2align 3
15   .L18:
16      movl    %edi, %eax
17      xorl    %edx, %edx
18      negl    %eax
19      divl    %esi
20      subl    %edx, %esi
21      movl    %esi, %edx
22      movl    %edx, %eax
23      ret
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • Hm you seem very pegged on this golf think.. get it.. pegged.. ;) – Filip Ekberg Feb 24 '10 at 15:40
  • This has less ops than caf's answer, especially if the negataion of `a` is a simple -a ;) (resulting in `return b - (-a % b);`) – LiraNuna Feb 25 '10 at 23:19
  • Yeah that was my original answer too, before I added the extra conditional to make it precisely match the behaviour of the code in the question. By the way, it is interesting that `gcc` creates an apparently useless `movl` instruction at the end of the function - I was seeing that too (it could just do `movl %esi, %eax`). I wonder if there is some subtle architectural reason for that? – caf Feb 26 '10 at 00:48
  • @caf, apparently the divl puts the result of the mod into edx (unless it is zero). The calling convention requires it to be shifted to eax – John La Rooy Feb 26 '10 at 01:13
  • 2
    It does, but I am talking about the last two mnemonics before `ret` (lines `21` and `22`). Those lines move the final result from `%esi` to `%edx`, then from `%edx` to `%eax` - that could be done with just one move from `%esi` to `%eax` – caf Feb 26 '10 at 03:15
4

Since the looping version seems to be quite fast, lets try eliminating the division :)

unsigned umod(int a, unsigned b){
    while(a>0)a-=b;
    while(a<0)a+=b;
    return a;
}
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • That's a nice optimization for the case when `a` and `b` don't differ much. For the case when, say, `a = MAX_INT`, `b = 2` the code would be terribly slow. Anyway, +1 for non-standard approach. – Vlad Feb 24 '10 at 08:50
  • @Vlad, Since LiraNuna says the original looping version does quite well in the real world tests, I wondered if this might be worth trying also, since we don't know the distribution of a and b – John La Rooy Feb 24 '10 at 08:57
  • simply because you don't understand division? – President James K. Polk Feb 25 '10 at 01:15
2

Portable edition, still with only one division, no branching, and no multiplication:

unsigned umod(int a, unsigned b) {
    int rem = a % (int) b;
    return rem + (-(rem < 0) & b);
}
C. K. Young
  • 219,335
  • 46
  • 382
  • 435
  • How is this better than doing a divide and a conditional in C code? Also, might this need clobber args? And maybe `__asm__ __volatile__`? – asveikau Feb 24 '10 at 05:36
  • @asveikau: No, no clobbering because of the constraints. Also, no volatile needed, as the compiler's ordering will guarantee the right results. You can do conditional in C code, if you can ensure no branching. (My portable version does that.) – C. K. Young Feb 24 '10 at 05:40
  • I was thinking you might need to clobber `cc` because of the `FLAGS` register. This has bit me sometimes while doing atomic ops with inline asm. At any rate I like the C version better. – asveikau Feb 24 '10 at 05:52
  • surely you can #ifdef the asm version though – John La Rooy Feb 24 '10 at 08:25
  • @gnibbler: You can, but why would you? The C version generates code (under gcc) that's even better than the assembly version I wrote. – C. K. Young Feb 24 '10 at 14:34
1

In your original function, you could have returned after the while loop finished for negative numbers, thus skipping the mod. This is in the same spirit, replacing the loop with a multiply - although it could be made to have fewer characters...

unsigned int umod2(int a, unsigned int b)
{
    return (a < 0) ? a + ((-a/b)+1)*b : a % b;
}

Here's the loop version:

unsigned int umod2_works(int a, unsigned int b)
{
    if (a < 0)
    {
        while (a < 0)
            a += b;
        return a;
    } else {
        return a % b;
    }
}

Both have been tested to match the OP's original function.

mtrw
  • 34,200
  • 7
  • 63
  • 71
  • Sorry, I meant to say 'replacing the loop with a *divide and* multiply.' I'm not sure which is faster, but this is at least correct. – mtrw Feb 24 '10 at 03:57
  • Doesn't make it go faster, you're adding another branch, thus slowing the routine down. I got even distribution of numbers as input. – LiraNuna Feb 24 '10 at 04:11
  • I'm surprised that another branch is slower than a mod. Anyway, Alok's answer looks like the nicest so far. – mtrw Feb 24 '10 at 04:17
  • It's not another branch, if it's written properly. – Stephen Canon Feb 24 '10 at 05:12
1

In a % b, if any of the operands is unsigned both are converted to unsigned. This means that if a is negative, you get a modulo UINT_MAX + 1 value instead of a. If UINT_MAX+1 is evenly divisible by b, then things are fine, and you can just return a % b. If not, you have do do the modulo in int type.

unsigned int umod(int a, unsigned int b)
{
    int ret;
    if (a >= 0) return a % b;
    if (b > INT_MAX) return a + b;
    ret = a % (int)b;
    if (ret < 0) ret += b;
    return ret;
}

Edit: Updated, but you should use caf's answer as that's simpler (or maybe not?!). This is here for the record.

Alok Singhal
  • 93,253
  • 21
  • 125
  • 158
1
int temp;

temp= (a > 0)? ( a % b ) :   b -( (-a) % b ) ;

code below:

int main()
{
int a;
unsigned b;
int temp;
printf("please enter an int and a unsigned number\n");
scanf("%d",&a);
scanf("%u",&b);
modulus(a,b);
temp= (a > 0)? ( a % b ) :   b -( (-a) % b ) ;
printf("\n temp is %d", temp);
return 0;
}
void modulus(int x,unsigned y)
{
int c;
if(x>0)
{
c=x%y;
printf("\n%d\n",c);}
else
{
while(x<0)
x+=y;
printf("\n%d\n",x);}
}


./a.out
please enter an int and a unsigned number
-8 3

1

 temp is 1
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
Vijay
  • 65,327
  • 90
  • 227
  • 319
1

Here's one that works over the whole range of unsigned without branching, but it uses multiplications and 2 divisions

unsigned umod(int a, unsigned b)
{
    return (a>0)*a%b+(a<0)*(b-1-~a%b);
}
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
0

If a and b are both much smaller than an int, then you can just add a sufficiently large multiple of b to every value before you mod.

unsigned umod(int a, unsigned b)
{
    return (unsigned)(a + (int)(b * 256)) % b;
}

Of course, this trick doesn't work if a + (b * 256) can overflow, but for a lot of the uses I can see for this code, you can be certain that it never will.

John Knoeller
  • 33,512
  • 4
  • 61
  • 92
0

Other than the while loop, Not sure whether the % operation can be optimized as a whole, but the optimization can happen on the pattern of the values for a & b.

If in those 21495808 times the operation is executed.

If the chances of passing a value for a which is less than b ( a < b ) is atleast half of the that. Adding the following statement will definetly improve the overall performance of the function.

if ( abs(a) < b ) // not specifically the abs function, can be your own implementation.
    return 0;
else
    return a%b;

If b is a power of 2 for atleast 80% of the cases, we can use bitwise operators as in

return ( abs(a) & (b-1) );

If the numbers are expected to be anything less than that, it would degrade the performance, as we need to verify whether b is power of 2 [ even after using bitwise operators for the same ] for everything.

Even the functionality to achieve abs(a) can be optimized using bitwise operators, with their own limitations, but is faster than verifying whether a is < 0.

n = (a ^ (a >> 31)) - (a >> 31); // instead of n = a < 0 ? -a : a;

There would be more such things, if you can explore.

Narendra N
  • 1,270
  • 1
  • 9
  • 14
  • % is cheap compared to the while loop which is non-deterministic – Clifford Feb 24 '10 at 05:49
  • @Clifford, you are right, Did I say anything converse of that? – Narendra N Feb 24 '10 at 06:22
  • Not really, but the first sentence makes it look like you are targetting the % for optimisation rather than the loop. – Clifford Feb 24 '10 at 12:35
  • By the time I saw this one, there were other answers handling that loop part and no answer was accepted, hence I thought I will take a look at the % part. I will change the answer mentioning it. – Narendra N Feb 24 '10 at 14:05
0

My preferred solution is to mod twice. I haven't tried this in C/C++ or with unsigned, but my test cases work in Java:

((a % b) + b) % b

The benefit is no branching and simplicity. The downside is the double mod. I haven't compared performance, but my understanding is that it is branching that hurts performance these days.

JodaStephen
  • 60,927
  • 15
  • 95
  • 117