0

So... the modulo operation doesn't seem to work on a 64-bit value of all ones.

Here is my C code to set up the edge case:

#include <stdio.h>

int main(int argc, char *argv[]) {
    long long max_ll =   0xFFFFFFFFFFFFFFFF;
    long long large_ll = 0x0FFFFFFFFFFFFFFF;
    long long mask_ll =  0x00000F0000000000;

    printf("\n64-bit numbers:\n");
    printf("0x%016llX\n", max_ll % mask_ll);
    printf("0x%016llX\n", large_ll % mask_ll);

    long max_l =   0xFFFFFFFF;
    long large_l = 0x0FFFFFFF;
    long mask_l =  0x00000F00;

    printf("\n32-bit numbers:\n");
    printf("0x%08lX\n", max_l % mask_l);
    printf("0x%08lX\n", large_l % mask_l);

    return 0;
}

The output shows this:

64-bit numbers:
0xFFFFFFFFFFFFFFFF
0x000000FFFFFFFFFF

32-bit numbers:
0xFFFFFFFF
0x000000FF

What is going on here?

Why doesn't modulo work on a 64-bit value of all ones, but it will on a 32-bit value of all ones?

It this a bug with the Intel CPU? Or with C somehow? Or is it something else?

More Info

I'm on a Windows 10 machine with an Intel i5-4570S CPU. I used the cl compiler from Visual Studio 2015.

I also verified this result using the Windows Calculator app (Version 10.1601.49020.0) by going into the Programmer mode. If you try to modulus 0xFFFF FFFF FFFF FFFF with anything, it just returns itself.

Specifying unsigned vs signed didn't seem to make any difference.

Please enlighten me :) I actually did have a use case for this operation... so it's not purely academic.

Hintron
  • 311
  • 2
  • 11
  • Reproducible on ideone: http://ideone.com/GzMAxK – Daniel Jour Jan 31 '16 at 01:22
  • 5
    You do realize that `0xFFFFFFFFFFFFFFFF` is just `-1` right? – Mysticial Jan 31 '16 at 01:24
  • 1
    what's wrong with this? `0xFFFFFFFFFFFFFFFF` is outside `long long`'s range so it'll wraparound to -1, and -1 mod with anything will return -1. `0x0FFFFFFFFFFFFFFF` is a proper `long long` value – phuclv Jan 31 '16 at 01:28
  • https://en.wikipedia.org/wiki/Two's_complement – Mikhail Jan 31 '16 at 01:28
  • 1
    @Mysticial: No, `0xFFFFFFFFFFFFFFFF` is `18446744073709551615`. Hex constants denote values, not representations. `0xFFFFFFFFFFFFFFFF` is likely of type `unsigned long long`. When converted to `long long`, it yields an implementation-defined value which is likely to be `-1LL`. – Keith Thompson Jan 31 '16 at 01:51
  • Of course I'm gonna get slapped by pedants when I try to keep something simple. – Mysticial Jan 31 '16 at 02:06
  • @Mysticial this is the main part of OP's misunderstanding so it's important to be clear. `0xFFFFFFFFFFFFFFFF` is a large positive number. The value `-1` only arises when that large positive number is attempted to be converted to `long long`. – M.M Jan 31 '16 at 02:31
  • @Mysticial Yeah, just like the hex representation better in this case :) – Hintron Jan 31 '16 at 02:40

3 Answers3

3

Your program causes undefined behaviour by using the wrong format specifier.

%llX may only be used for unsigned long long. If you use the right specifier, %lld then the apparent mystery will go away:

#include <stdio.h>

int main(int argc, char* argv[])
{
    long long max_ll =   0xFFFFFFFFFFFFFFFF;
    long long mask_ll =  0x00000F0000000000;

    printf("%lld %% %lld = %lld\n", max_ll, mask_ll, max_ll % mask_ll);
}

Output:

-1 % 16492674416640 = -1

In ISO C the definition of the % operator is such that (a/b)*b + a%b == a. Also, for negative numbers, / follows "truncation towards zero".

So -1 / 16492674416640 is 0, therefore -1 % 16492674416640 must be -1 to make the above formula work.


As discussed in comments, the following line:

long long max_ll =   0xFFFFFFFFFFFFFFFF;

causes implementation-defined behaviour (assuming that your system has long long as a 64-bit type). The constant 0xFFFFFFFFFFFFFFFF has type unsigned long long, and it is out of range for long long whose maximum permitted value is 0x7FFFFFFFFFFFFFFF.

When an out-of-range assignment is made to a signed type, the behaviour is implementation-defined, which means the compiler documentation must say what happens.

Typically, this will be defined as generating the value which is in range of long long and has the same representation as the unsigned long long constant has. In 2's complement , (long long)-1 has the same representation as the unsigned long long value 0xFFFFFFFFFFFFFFFF, which explains why you ended up with max_ll holding the value -1.

Community
  • 1
  • 1
M.M
  • 138,810
  • 21
  • 208
  • 365
  • I never realized that hex constants are inherently unsigned types, and that assigning it to a signed data type initiates an implementation-defined conversion. Interesting. So I guess the "pure" way of assigning it would have been to put "long long max_ll = -1;" – Hintron Jan 31 '16 at 03:00
  • @Hintron Hex constants aren't inherently unsigned, just this particular one. `0x7FFFFFFFFFFFFFFF` is signed (`long long int`) . Larger constants go to unsigned because they cannot fit in any signed type. – M.M Jan 31 '16 at 04:41
  • @M.M: the case of hex constants is a quite subtile: C11 6.4.4.1: *The type of an integer constant is the first of the corresponding list in which its value can be represented. For octal and hexadecimal constants, the list is `int`, `unsigned int`, `long int`, `unsigned long int`, `long long int`, `unsigned long long int`.* Thus if `int` is 32 bits, `0xFFFFFFFF` **is** inherently unsigned and `-0xFFFFFFFF != -4294967295` – chqrlie Jan 31 '16 at 12:37
2

Actually it does make a difference whether the values are defined as signed or unsigned:

#include <stdio.h>
#include <limits.h>

int main(void) {
#if ULLONG_MAX == 0xFFFFFFFFFFFFFFFF
    long long max_ll =   0xFFFFFFFFFFFFFFFF;  // converts to -1LL
    long long large_ll = 0x0FFFFFFFFFFFFFFF;
    long long mask_ll =  0x00000F0000000000;

    printf("\n" "signed 64-bit numbers:\n");
    printf("0x%016llX\n", max_ll % mask_ll);
    printf("0x%016llX\n", large_ll % mask_ll);

    unsigned long long max_ull =   0xFFFFFFFFFFFFFFFF;
    unsigned long long large_ull = 0x0FFFFFFFFFFFFFFF;
    unsigned long long mask_ull =  0x00000F0000000000;

    printf("\n" "unsigned 64-bit numbers:\n");
    printf("0x%016llX\n", max_ull % mask_ull);
    printf("0x%016llX\n", large_ull % mask_ull);
#endif

#if UINT_MAX == 0xFFFFFFFF
    int max_l =   0xFFFFFFFF;  // converts to -1;
    int large_l = 0x0FFFFFFF;
    int mask_l =  0x00000F00;

    printf("\n" "signed 32-bit numbers:\n");
    printf("0x%08X\n", max_l % mask_l);
    printf("0x%08X\n", large_l % mask_l);

    unsigned int max_ul =   0xFFFFFFFF;
    unsigned int large_ul = 0x0FFFFFFF;
    unsigned int mask_ul =  0x00000F00;

    printf("\n" "unsigned 32-bit numbers:\n");
    printf("0x%08X\n", max_ul % mask_ul);
    printf("0x%08X\n", large_ul % mask_ul);
#endif
    return 0;
}

Produces this output:

signed 64-bit numbers:
0xFFFFFFFFFFFFFFFF
0x000000FFFFFFFFFF

unsigned 64-bit numbers:
0x000000FFFFFFFFFF
0x000000FFFFFFFFFF

signed 32-bit numbers:
0xFFFFFFFF
0x000000FF

unsigned 32-bit numbers:
0x000000FF
0x000000FF

64 bit hex constant 0xFFFFFFFFFFFFFFFF has value -1 when stored into a long long. This is actually implementation defined because of out of range conversion into a signed type, but on Intel processors, with current compilers, the conversion just keeps the same bit pattern.

Note that you are not using the fixed size integers defined in <stdint.h>: int64_t, uint64_t, int32_t and uint32_t. long long types are specified in the standard as having at least 64 bits, and on Intel x86_64, they do, and long has at least 32 bits, but for the same processor, the size differs between environments: 32 bits in Windows 10 (even in 64 bit mode) and 64 bits on MaxOS/10 and linux64. This is the reason why you observe surprising behavior for the long case where unsigned and signed may produce the same result. They don't on Windows, but they do in linux and MacOS because the computation is done in 64 bits and these values are just positive numbers.

Also note that LLONG_MIN / -1 and LLONG_MIN % -1 both invoke undefined behavior because of signed arithmetic overflow, and this one is not ignored on Intel PCs, it usually fires an uncaught exception and exits the program, just like 1 / 0 and 1 % 0.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Out of range conversion is implementation-defined. Arithmetic overflow occurs when an arithmetic operator is used and the result of the arithmetic operation is out of range. (Assignment isn't an arithmetic operator; not that this is assignment anyway since it is initialization). – M.M Jan 31 '16 at 02:23
  • @M.M: good point, I updated the answer. More precisely: C11 6.3.1.3: *Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.* But for floating point, 6.3.1.4: *When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.*. UB including for unsigned target type. – chqrlie Jan 31 '16 at 10:03
1

Try putting unsigned before your long long. As a signed number, your 0xFF...FF is actually -1 on most platforms.

Also, in your code, your 32-bit numbers are still 64-bits (you have them declared as long long as well).

mihyar
  • 29
  • 7
  • OP wrote that `unsigned` seem to make no difference. I guess it'll only work if you make both the mask and the number unsigned: http://ideone.com/H9Oad8 – Daniel Jour Jan 31 '16 at 01:31
  • @mihyar You mean calling it max_l instead of max_ll doesn't change the fact that it is long long? ;) Thanks for catching that, I didn't even notice. I'll update the question with the fixes, since that wasn't really the main crux of the issue. – Hintron Jan 31 '16 at 02:44