25

I need to do modulo 256 arithmetic in C. So can I simply do

unsigned char i;
i++;

instead of

int i;
i=(i+1)%256;
LihO
  • 41,190
  • 11
  • 99
  • 167
avmohan
  • 1,820
  • 3
  • 20
  • 39
  • 1
    well, since C99 there is `stdint.h` and you can simply use `uintXX_t` http://www.cplusplus.com/reference/cstdint/ – user2485710 Feb 06 '14 at 18:40
  • 7
    While this works, it falls into the category of overclever tricks that are likely to get removed by another developer that thinks your trick is a bug. Being explicit about your intent with modulo prevents such. – Mgetz Feb 06 '14 at 18:41
  • If you don't have to rely on anyone else maintaining your code, then yes. I just don't see why you would. The extra work in maintaining confusing code is not worth it. – Peter Abolins Feb 06 '14 at 18:54
  • 3
    Remember `%` is remainder not modulo! although modulo == remainder for positive numbers. Both code pieces are not equivalent because first give you modulo whereas second gives remainder (your `i` is `int` not `unsigned int`). – Grijesh Chauhan Feb 06 '14 at 18:58
  • @Mgetz There are no other developers since it's for a lab assignment where I've to implement RC4. I've decided not to do it anyway since it's better to be explicit rather than implicit, and I better not do esoteric tricks esp when someone is grading it.. – avmohan Feb 06 '14 at 19:07
  • 2
    @v3ga - it's good that you asked; it led to an interesting discussion, and the insight that "not every byte is 8 bits" - which was news to me. You are right that you are better off with the "explicit" code that makes it obvious what you are doing - and a good compiler knows fast tricks for doing `%256` so you don't have to micro-optimize. There is a LOT to be said for writing clear code - expecially if it's a lab and gets graded! Thanks for starting an interesting discussion. – Floris Feb 06 '14 at 19:19
  • It was news to me too (actually I get a lot of such stuff out of most questions on C). – avmohan Feb 06 '14 at 19:23
  • @v3ga I will suggest you if you use first version (or either suggested by Alexandre C.) You can always add a sort comment in code that explain that you evaluating mod 256 that will be fine. Just add `//trick to eval mod 256`. I good program is one who knows more than one ways. code like iitians :) – Grijesh Chauhan Feb 07 '14 at 06:50

7 Answers7

25

No. There is nothing that guarantees that unsigned char has eight bits. Use uint8_t from <stdint.h>, and you'll be perfectly fine. This requires an implementation which supports stdint.h: any C99 compliant compiler does, but older compilers may not provide it.

Note: unsigned arithmetic never overflows, and behaves as "modulo 2^n". Signed arithmetic overflows with undefined behavior.

Alexandre C.
  • 55,948
  • 11
  • 128
  • 197
  • @user2485710: Even MSVC has `stdint.h` now that they try to implement C++11. – Alexandre C. Feb 06 '14 at 18:45
  • Please see http://stackoverflow.com/a/2215454/1967396 for a reference that says "char is always one byte" in C99 – Floris Feb 06 '14 at 18:46
  • 10
    @Floris: Yes, but one C byte is not necessarily 8 bits. It is *defined* as the size of one `char`. – Alexandre C. Feb 06 '14 at 18:46
  • yes, but it's always a requirements and more often than not I see any kind o requirements from people using `C`, really really old implementations are, for example, quite popular in the embedded world. – user2485710 Feb 06 '14 at 18:47
  • @AlexandreC. I see - you are right. That makes your answer better than mine. – Floris Feb 06 '14 at 18:48
  • @user2485710: You are right about the embedded world. On the desktop, this is no longer a problem though. – Alexandre C. Feb 06 '14 at 18:48
  • 2
    And the OP it's not even mentioning kind of machine that he is targeting, so you just don't know and you should specify this in your answer; also other people may need this piece of information. – user2485710 Feb 06 '14 at 18:50
  • 2
    @user2485710: Done. Since byte tricks are quite common in such environments, a word of caution is indeed necessary. – Alexandre C. Feb 06 '14 at 18:52
  • 1
    Even with a C99 compliant compiler, all the `intN_t` and `uintN_t` types are optional, and are not guaranteed to be available. On any system that doesn't have 8-bit bytes (admittedly, probably unusual) they likely won't be available. – Crowman Feb 06 '14 at 19:35
  • 2
    @PaulGriffiths: One example that I know of is a particular 16-bit Texas Instruments DSP embedded platform that only supports word addressing, so everything is at least 16 bits. `sizeof(char) == sizeof(short) == sizef(int) == 1 == 16 bits`. The assumption suggested in this answer wouldn't work there. – Jason R Feb 07 '14 at 02:39
6

Yes, the behavior of both of your examples is the same. See C99 6.2.5 §9 :

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

LihO
  • 41,190
  • 11
  • 99
  • 167
5

Very probably yes, but the reasons for it in this case are actually fairly complicated.

unsigned char i = 255;
i++;

The i++ is equivalent to i = i + 1.

(Well, almost. i++ yields the value of i before it was incremented, so it's really equivalent to (tmp=i; i = i + 1; tmp). But since the result is discarded in this case, that doesn't raise any additional issues.)

Since unsigned char is a narrow type, an unsigned char operand to the + operator is promoted to int (assuming int can hold all possible values in the range of unsigned char). So if i == 255, and UCHAR_MAX == 255, then the result of the addition is 256, and is of type (signed) int.

The assignment implicitly converts the value 256 from int back to unsigned char. Conversion to an unsigned type is well defined; the result is reduced modulo MAX+1, where MAX is the maximum value of the target unsigned type.

If i were declared as an unsigned int:

unsigned int i = UINT_MAX;
i++;

there would be no type conversion, but the semantics of the + operator for unsigned types also specify reduction module MAX+1.

Keep in mind that the value assigned to i is mathematically equivalent to (i+1) % UCHAR_MAX. UCHAR_MAX is usually 255, and is guaranteed to be at least 255, but it can legally be bigger.

There could be an exotic system on which UCHAR_MAX is too be to be stored in a signed int object. This would require UCHAR_MAX > INT_MAX, which means the system would have to have at least 16-bit bytes. On such a system, the promotion would be from unsigned char to unsigned int. The final result would be the same. You're not likely to encounter such a system. I think there are C implementations for some DSPs that have bytes bigger than 8 bits. The number of bits in a byte is specified by CHAR_BIT, defined in <limits.h>.

CHAR_BIT > 8 does not necessarily imply UCHAR_MAX > INT_MAX. For example, you could have CHAR_BIT == 16 and sizeof (int) == 2 i.e., 16-bit bytes and 32 bit ints).

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • 1
    Technically `i++` is equivalent to making a copy of `i` before incrementing it and using the old copy in the expression where the post-increment occurred. `++i` would be the true equivalent to `i = i + 1`/`i += 1`. – JAB Feb 06 '14 at 19:46
5
unsigned char c = UCHAR_MAX;
c++;

Basically yes, there is no overflow, but not because c is of an unsigned type. There is a hidden promotion of c to int here and an integer conversion from int to unsigned char and it is perfectly defined.

For example,

 signed char c = SCHAR_MAX;
 c++;

is also not undefined behavior, because it is actually equivalent to:

c = (int) c + 1;

and the conversion from int to signed char is implementation-defined here (see c99, 6.3.1.3p3 on integer conversions). To simplify CHAR_BIT == 8 is assumed.

For more information on the example above, I suggest to read this post:

"The Little C Function From Hell"

http://blog.regehr.org/archives/482

ouah
  • 142,963
  • 15
  • 272
  • 331
  • Thanks for the link to "The little C function from hell". I had seen it before, but I had forgotten to bookmark it at the time. As I couldn't recall the exact title I found it impossible to Google for it. – Tonny Feb 06 '14 at 20:06
  • The sum of two `unsigned char` values won't overflow, but on machines where `sizeof (int)` is two, the product of two `unsigned char` values could overflow. Historically, the fact that the standard would allow a compiler on a 16-bit machine to do anything it liked with `unsigned char x=255; x*=255;` wasn't seen as a defect because in practice even compilers which used 16-bit `int` would behave sensibly. The fact that such code yielded Undefined Behavior was seen as a theoretical issue, rather than an opportunity for compilers to negate laws of time and causality. – supercat Jul 25 '15 at 20:18
3

There's another alternative that hasn't been mentioned, if you don't want to use another data type.

unsigned int i;
// ...
i = (i+1) & 0xFF; // 0xFF == 255

This works because the modulo element == 2^n, meaning the range will be [0, 2^n-1] and thus a bitmask will easily keep the value within your desired range. It's possible this method would not be much or any less efficient than the unsigned char/uint8_t version, either, depending on what magic your compiler does behind the scenes and how the targeted system handles non-word loads (for example, some RISC architectures require additional operations to load non-word-size values). This also assumes that your compiler won't detect the usage of power-of-two modulo arithmetic on unsigned values and substitute a bitmask for you, of course, as in cases like that the modulo usage would have greater semantic value (though using that as the basis for your decision is not exactly portable, of course).

An advantage of this method is that you can use it for powers of two that are not also the size of a data type, e.g.

i = (i+1) & 0x1FF; // i %= 512
i = (i+1) & 0x3FF; // i %= 1024
// etc.
JAB
  • 20,783
  • 6
  • 71
  • 80
2

This should work fine because it should just overflow back to 0. As was pointed out in a comment on a different answer, you should only do this when the value is unsigned, as you may get undefined behavior with a signed value.

It is probably best to leave this using modulo, however, because the code will be better understood by other people maintaining the code, and a smart compiler may be doing this optimization anyway, which may make it pointless in the first place. Besides, the performance difference will probably be so small that it wouldn't matter in the first place.

Gavin
  • 8,204
  • 3
  • 32
  • 42
1

It will work if the number of bits that you are using to represent the number is equal to number of bits in binary (unsigned) representation (100000000) of the divisor -1 which in this case is : 9-1= 8 (char)

Wajahat
  • 1,593
  • 3
  • 20
  • 47