Code with and without -std=c99 produces different results (UMAC AE implementation)

Question

Long (really long) story short - I use Ted Krovetz's implementation for calculating UMAC and for UMAC AE encryption (http://www.fastcrypto.org/).

When I compile my code (and/or the tests in umac.c) with -std=c99, the calculated UMAC is COMPLETELY different from the expected (and wrong). When I remove this option, everything works like a charm.

Any ideas what could cause this? And what I can do to check what happens and what produces the different results?

$ gcc --version
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2

$ uname -a
xxx 3.13.0-43-generic #72-Ubuntu SMP .. x86_64 x86_64 x86_64 GNU/Linux

I don't use any other options - just with and without -std=c99.

A few more words:

I'll try to contact Ted Krovetz and ask him for this (it's probably some bug or something), BUT that's not the point. The question is a bit more general and this specific problem could be seen as example.

I ran valgrind - nothing special. Added -Wall and -Wextra - nothing again. Sounds like UB, but valgrind doesn't complain for anything.

The situation is very interesting, took me a lot of days and headaches to understand, that the problem is not in my code (I use this implementation for implementing a complex protocol), but in the algorithm and especially in this option. So I decided to ask for opinions.

This Can code that is valid in both C and C++ produce different behavior when compiled in each language? is not related at all, as we're talking about the same language here.
This Massive fprintf speed difference without "-std=c99" is close, but not enough..

EDIT

Here are my test results and what I do (the sources/headers are just downloaded, I don't change anything):

$ ll
total 176K
-rw-r----- 1 kk kk  63K Jan 20 11:00 rijndael-alg-fst.c
-rw-r----- 1 kk kk 2.0K Jan 20 11:00 rijndael-alg-fst.h
-rw-r----- 1 kk kk 3.4K Jan 20 11:00 umac_ae.h
-rw-r----- 1 kk kk  76K Jan 20 11:00 umac.c
-rw-r----- 1 kk kk 4.2K Jan 20 11:00 umac.h

$ gcc -c *.c

$ gcc *.o

$ ./a.out 
AES Test :::
Digest is       : 3AD78E726C1EC02B7EBFE92B23D9EC34
Digest should be: 3AD78E726C1EC02B7EBFE92B23D9EC34

UMAC Test :::
Msg           Should be        Is
---           ---------        --
'a' *     0 : 4D61E4F5AAB959C8 4D61E4F5AAB959C8
'a' *     3 : 67C1700CA30B532D 67C1700CA30B532D
'a' *  1024 : 05CB9405EC38D9F0 05CB9405EC38D9F0
'a' * 32768 : 048C543CB72443A4 048C543CB72443A4

Verifying consistancy of single- and multiple-call interfaces.
Done.

Authenticating       44 byte messages:  6.45 cpb.
Authenticating       64 byte messages:  4.18 cpb.
Authenticating      256 byte messages:  1.63 cpb.
Authenticating      512 byte messages:  1.20 cpb.
Authenticating      552 byte messages:  1.22 cpb.
Authenticating     1024 byte messages:  1.00 cpb.
Authenticating     1500 byte messages:  1.04 cpb.
Authenticating     8192 byte messages:  0.90 cpb.
Authenticating   262144 byte messages:  0.89 cpb.

UMAC-AE Tests :::
0 bytes ('abc' * 0):
Encrypt/decrypt match, tags match
Should be: 0000000000000000
Is       : E8D1DAC3EA21E56D
3 bytes ('abc' * 1):
Encrypt/decrypt match, tags match
Should be: 0000000000000000
Is       : 6BEDBA31E074E2A4
48 bytes ('abc' * 16):
Encrypt/decrypt match, tags match
Should be: 0000000000000000
Is       : A3F6069B913969DA
300 bytes ('abc' * 100):
Encrypt/decrypt match, tags match
Should be: 0000000000000000
Is       : C5B7F3822179FC36
3000000 bytes ('abc' * 1000000):
Encrypt/decrypt match, tags match
Should be: 0000000000000000
Is       : EE7F50FDDA60AA04
  16 bytes, 38.12 cpb
  32 bytes, 25.04 cpb
  64 bytes, 19.39 cpb
 128 bytes, 16.41 cpb
 256 bytes, 14.79 cpb
 512 bytes, 13.96 cpb
1024 bytes, 13.79 cpb
2048 bytes, 13.46 cpb
4096 bytes, 13.47 cpb










$ ll
total 176K
-rw-r----- 1 kk kk  63K Jan 20 11:00 rijndael-alg-fst.c
-rw-r----- 1 kk kk 2.0K Jan 20 11:00 rijndael-alg-fst.h
-rw-r----- 1 kk kk 3.4K Jan 20 11:00 umac_ae.h
-rw-r----- 1 kk kk  76K Jan 20 11:00 umac.c
-rw-r----- 1 kk kk 4.2K Jan 20 11:00 umac.h

$ gcc -std=c99 -c *.c 

$ gcc -std=c99 *.o

$ ./a.out 
AES Test :::
Digest is       : 3AD78E726C1EC02B7EBFE92B23D9EC34
Digest should be: 3AD78E726C1EC02B7EBFE92B23D9EC34

UMAC Test :::
Msg           Should be        Is
---           ---------        --
'a' *     0 : 4D61E4F5AAB959C8 9492DE86794C9F2B
'a' *     3 : 67C1700CA30B532D CF9505F52928360E
'a' *  1024 : 05CB9405EC38D9F0 9C48C0D4EFAFAA37
'a' * 32768 : 048C543CB72443A4 7F63C29BB54BB141

Verifying consistancy of single- and multiple-call interfaces.
Done.

Authenticating       44 byte messages:  7.91 cpb.
Authenticating       64 byte messages:  5.20 cpb.
Authenticating      256 byte messages:  3.03 cpb.
Authenticating      512 byte messages:  2.60 cpb.
Authenticating      552 byte messages:  2.71 cpb.
Authenticating     1024 byte messages:  2.41 cpb.
Authenticating     1500 byte messages:  2.43 cpb.
Authenticating     8192 byte messages:  2.27 cpb.
Authenticating   262144 byte messages:  2.23 cpb.

UMAC-AE Tests :::
0 bytes ('abc' * 0):
Encrypt/decrypt match, tags match
Should be: 0000000000000000
Is       : 899C50FD244BBA83
3 bytes ('abc' * 1):
Encrypt/decrypt match, tags match
Should be: 0000000000000000
Is       : 892D14F581A3A4DD
48 bytes ('abc' * 16):
Encrypt/decrypt match, tags match
Should be: 0000000000000000
Is       : 621AB4A63383F3C5
300 bytes ('abc' * 100):
Encrypt/decrypt match, tags match
Should be: 0000000000000000
Is       : 324BEF6489F57787
3000000 bytes ('abc' * 1000000):
Encrypt/decrypt match, tags match
Should be: 0000000000000000
Is       : 1A25FE3714C9345A
  16 bytes, 40.80 cpb
  32 bytes, 25.87 cpb
  64 bytes, 20.50 cpb
 128 bytes, 17.72 cpb
 256 bytes, 15.93 cpb
 512 bytes, 15.33 cpb
1024 bytes, 14.88 cpb
2048 bytes, 14.71 cpb
4096 bytes, 14.48 cpb

I just tested on another machine and it's the same as on mine.

You might want to try running the clang static analyzer over it... — Deduplicator, Jan 19 '15 at 17:43
I suspect its a floating point issue. Try compiling adding `-Wfloat-equal` and `-Wtraditional` to see if `gcc` gives you any additional warnings. — user590028, Jan 19 '15 at 17:50
@Deduplicator - that's an idea, I'll try it tomorrow. Is that static analyzer free? — Kiril Kirov, Jan 19 '15 at 17:58
@user590028 - `-Wfloat-equal` does not say anything, but `-Wtraditional` dumps tons of warnings. I'll examine them later, as most of them (at first sight) are like for things like `int lengths[] = {0,3,1024,32768}`. But that's great idea, thanks! I thought `-Wall` and `-Wextra` include really _all_ and _extra_ warnings.. — Kiril Kirov, Jan 19 '15 at 18:00
@KirilKirov: Yes, it's free, just look at the llvm site. Still, for anything but Apple, you have to compile it yourself, or find it compiled somewhere else. That's what any Linux distro has packages for... — Deduplicator, Jan 19 '15 at 18:03
Not sure if this is useful and if I understood correctly the problem, but running the tests in XCode on Apple ( primitive_verify() and umac_verify() ) AES and UMAC are equal even using c99. Tried both on 32bit and 64 bit archs — LombaX, Jan 19 '15 at 18:49
So I'm compiling the code as: `/opt/gcc/4.8.2/bin/gcc -O3 umac.c rijndael-alg-fst.c -DRUN_TESTS` and then with `-std=c99`, `-std=c11`, and `-std=gnu11`, and I haven't been able to get the output to be different. Could you point out where exactly in the output of `RUN_TESTS` you are finding different output? This is what I'm currently seeing for all of my attempts: https://gist.github.com/sharth/423d77fa315b2fd65adc — Bill Lynch, Jan 19 '15 at 19:27
@BillLynch, LombaX: I updated the question and posted my results. — Kiril Kirov, Jan 20 '15 at 09:07
The gcc default is `-std=gnu89`. The code may be relying on GCC extensions which are not present in `c99` mode. — M.M, Jan 20 '15 at 12:08
@MattMcNabb - yes, it looks like this. Please see my answer below, if you're interested in the situation. — Kiril Kirov, Jan 20 '15 at 12:10

Kiril Kirov · Accepted Answer · 2015-01-20T15:35:58.670

Well, I figured it out. Still not sure how to fix it to be perfectly fine/portable, but I'll continue digging.

Long story short - it appeared to be platform specific, that's why most of you don't have this problem.
The issue is with determining the endianness.

Details:

After comparing the assembly outputs, there were some significant differences, which (almost automatically) excluded some problems with big constant interpretations and minor things like this.

Then I tried at higher level - the preprocessor output.

Finally, everything led to this piece of code in umac.c:

/* Message "words" are read from memory in an endian-specific manner.     */
/* For this implementation to behave correctly, __LITTLE_ENDIAN__ must    */
/* be set true if the host computer is little-endian.                     */

#ifndef __LITTLE_ENDIAN__
#if __i386__ || __alpha__ || _M_IX86 || __LITTLE_ENDIAN
#define __LITTLE_ENDIAN__ 1
#else
#define __LITTLE_ENDIAN__ 0
#endif
#endif

On my platform, __i386__, __alpha__ and _M_IX86 are not defined. The key is in __LITTLE_ENDIAN.

When compiled:

with -std=c99: __LITTLE_ENDIAN is not defined => #define __LITTLE_ENDIAN__ 0.
without -std=c99: __LITTLE_ENDIAN is defined => #define __LITTLE_ENDIAN__ 1.

Hardcoding #define __LITTLE_ENDIAN__ 1, everything starts working perfectly with and without -std=c99.

Conclusion: __LITTLE_ENDIAN is a gcc specific macro, which is used here for determining the endianness; it appeared, that -std=c99 affects this macro (it's not defined if the option is used), which leads to different(wrong) results.

EDIT
My current ("temporary") solution would be to update the problematic preprocessor if-statement. I know this is far from the best way to solve this, but detecting endianness appeared to be not that easy and far from trivial.

Runtime checks seem to be more reliable, BUT this would lead to more changes in the code, something I want to avoid. It looks like, the most "harmless" "solution" is to update and "fix" the current solution.

So, as I only need it (for now) to work with GCC, I did the following modification:

#ifndef __LITTLE_ENDIAN__
    #if __GNUC__
        #include <endian.h>
        #if __BYTE_ORDER == __LITTLE_ENDIAN
            #define __LITTLE_ENDIAN__ 1
        #elif __BYTE_ORDER == __BIG_ENDIAN
            #define __LITTLE_ENDIAN__ 0
        #else
            #error "Cannot determine endianness! Please update this macro!"
        #endif
    #elif __i386__ || __alpha__ || _M_IX86
        #define __LITTLE_ENDIAN__ 1
    #else
        #warning "Endianness cannot be determined for this platform; using big endian by default! Please be aware and update this macro!"
        #define __LITTLE_ENDIAN__ 0
    #endif
#endif

Not sure if related, but I have had some trouble with undefined macros silently defaulting to 0. I always use a double-check such as `#if defined(MACRO) && (MACRO == 1)` instead of just `#if MACRO`. I also use `-Wundef` to get warned about undefined macros being used. My point here is that if __LITTLE_ENDIAN__ is not defined in C99 mode, it will likely default to 0. — Morten Jensen, Jan 20 '15 at 12:11
@MortenJensen if you attempt to preprocessor arithmetic etc. with a token that has not previously been `#define`'d then it is replaced with `0` — M.M, Jan 20 '15 at 12:17
GCC's default dialect of C is not `c89`, is it `gnu89`, that is, c89 plus gnu extensions. By using `-std=c99` you do not just enable support for the C99 standard, you also disable the GNU extensions. To keep the GNU extensions, try `-std=gnu99`. — Pascal Cuoq, Jan 25 '15 at 12:48

Pascal Cuoq · Answer 2 · 2015-01-19T18:06:35.827

4

Sounds like UB,

It doesn't have to be. There are a couple of known differences that can cause the same program to be interpreted differently as C99 and as C90.

but valgrind doesn't complain for anything.

Valgrind does not even come close to warning for all undefined behaviors in either standard.

The first difference that comes to mind (but a good compiler would have emitted a warning) is the type of the integer constant 3000000000. With 32-bit int and 64-bit long, a C90 compiler types 3000000000 as an unsigned long. In C99, unsigned long is not in the list of types that integer constants without suffixes can have, so 3000000000 is typed as long long (signed).

Without looking, cryptographic code is likely to have lots of large integer constants, so this is one possibility.

Of course, there might be undefined behavior in the code interpreted as C90 or as C99, and then it would be excusable for a compiler to produce differing results in C90 and C99 modes. All I am saying is that there doesn't need to be.

edited Jan 19 '15 at 18:06

answered Jan 19 '15 at 17:41

Pascal Cuoq

79,187
7
161
281

Sounds logical. I'll try digging a bit more about what differences with and without this flag, it's interesting. Especially when it *does change* the output.. If you're interested in this case, you could find the sources in the link above. I took a look at them - a lot of constants, assembly codes, preprocessor directives, etc.. Thanks for sharing your experience. – Kiril Kirov Jan 19 '15 at 18:17
@KirilKirov I am looking at the code right now. Nothing has attracted my attention yet. There remains the possibility of adding a lot of `printf` calls, if all else fails. – Pascal Cuoq Jan 19 '15 at 18:19
I'll try digging today, it's getting more and more interesting. – Kiril Kirov Jan 20 '15 at 09:08

score 2 · Answer 3 · edited May 23 '17 at 12:28

This answer builds on Kiril Kirov's existing answer

Kiril has identified the problem that the system of preprocessor checks is failing to identify the existing platform:

#if __i386__ || __alpha__ || _M_IX86 || __LITTLE_ENDIAN
#define __LITTLE_ENDIAN__ 1
#else
#define __LITTLE_ENDIAN__ 0
#endif

Although the platform is little-endian, none of those identifiers are defined in gcc -std=c99 mode. So this is a problem with the quoted code here; the code needs to be updated to better identify whether it's a little-endian platform or not.

The first thing I would do is stop having a default case, so that if the platform is unrecognized then an error is generated instead of silently running wrongly:

#if __i386__ || __alpha__ || _M_IX86 || __LITTLE_ENDIAN
#define __LITTLE_ENDIAN__ 1
#elif __arm__  // etc.
#define __LITTLE_ENDIAN__ 0
#else
#error Unrecognized platform, please update this macro
#endif

The next step would be to actually detect correctly for what system you are on. Here is another thread on this topic.

One thing you can do is to issue gcc -std=c99 -dM -E - <<<'' which will cause gcc to output a list of all predefined macros in that mode; and then you can look for something useful. In my case it has:

#define __i386 1
#define __i686 1

so either of those could be used.

An alternative approach specifically for __LITTLE_ENDIAN__ is to detect it via preprocessor arithmetic, described here - although the first code sample on that page doesn't actually produce a preprocessing constant, so it wouldn't be usable as a condition for future preprocessor checks.

Having solved this specific problem, you should still try to find other instance of architecture problems in the same codebase. It looks like the author has cobbled together a few architecture macros he/she knew about. One conspicuous absence is _M_IX64 for example. It'd be wise to perhaps grep through the codebase for any other instances of _M_ or __i386__ and see if it is relying on those. If it is; then again perhaps try to abstract out that test into a macro that you have better control over.

Ideally you'd have all such macros defined in a single header for the whole codebase; and then the rest of the codebase just uses macros which have been defined in that header.

+1, I completely agree about the so-called default case! I write an email to the author of this code, we'll see what will be the solution. Meanwhile, I'll think of something. Thanks for the great ideas and for the time. — Kiril Kirov, Jan 20 '15 at 14:01

Code with and without -std=c99 produces different results (UMAC AE implementation)

3 Answers3