58

In a recent homework assignment I've been told to use long variable to store a result, since it may be a big number.

I decided to check will it really matter for me, on my system (intel core i5/64-bit windows 7/gnu gcc compiler) and found out that the following code:

printf("sizeof(char) => %d\n", sizeof(char));
printf("sizeof(short) => %d\n", sizeof(short));
printf("sizeof(short int) => %d\n", sizeof(short int));
printf("sizeof(int) => %d\n", sizeof(int));
printf("sizeof(long) => %d\n", sizeof(long));
printf("sizeof(long int) => %d\n", sizeof(long int));
printf("sizeof(long long) => %d\n", sizeof(long long));
printf("sizeof(long long int) => %d\n", sizeof(long long int));

produces the following output:

sizeof(char) => 1
sizeof(short) => 2
sizeof(short int) => 2
sizeof(int) => 4
sizeof(long) => 4
sizeof(long int) => 4
sizeof(long long) => 8
sizeof(long long int) => 8

In other words, on my system, int and long are the same, and whatever will be too big for int to hold, will be too big for long to hold as well.

The homework assignment itself is not the issue here. I wonder how, on a system where int < long, should I assign an int to long?

I'm aware to the fact that there are numerous closely related questions on this subject, but I feel that the answers within these do not provide me with the complete understanding of what will or may happen in the process.

Basically I'm trying to figure out the following:

  1. Should I cast long to int before the assignment, or since long is not a different data type, but merely a modifier, it will be considered unharmful to assign directly?
  2. What happens on systems where long > int? Will the result be undefined (or unpredictable) or it will cause the extra parts of the variable to be omitted?
  3. How does the casting from long to int works in C?
  4. How does the assignment from long to int works in C when I don't use casting?
Community
  • 1
  • 1
Khaloymes
  • 759
  • 2
  • 6
  • 9
  • 4
    C has modifiers (`volatile`, `const`), but `short`, `long`, `signed`, and `unsigned` are _NOT_ modifiers. They specify unique types. – Mooing Duck Nov 30 '12 at 20:24
  • Careful there. If you were to look at say a Linux AMD64 system. long is 8 bytes where as int in 4. See https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models for some common instances. –  Nov 30 '12 at 20:24
  • @Rajesh: I don't think it has anything to do with the OS _or_ hardware. How big types are is a (mostly) arbitrary decision by the _compiler_. – Mooing Duck Nov 30 '12 at 20:27
  • @MooingDuck and when all compilers for a platform do the same thing, it's more or less, settled :) –  Nov 30 '12 at 20:31
  • 6
    @MooingDuck: The decision made by the compiler (more accurately, by its authors) is strongly influenced by the OS and the hardware. Many systems have an ABI that specifies sizes of integer types; following it makae it possible to mix code compiled by different compilers. In some cases, backward compatility is a strong influence, sometimes resulting in things like 32-bit `long` on a 64-bit system. – Keith Thompson Nov 30 '12 at 20:36
  • @MooingDuck Thanks for your correction regarding the modifiers/types issue! – Khaloymes Dec 01 '12 at 09:37

2 Answers2

48

The language guarantees that int is at least 16 bits, long is at least 32 bits, and long can represent at least all the values that int can represent.

If you assign a long value to an int object, it will be implicitly converted. There's no need for an explicit cast; it would merely specify the same conversion that's going to happen anyway.

On your system, where int and long happen to have the same size and range, the conversion is trivial; it simply copies the value.

On a system where long is wider than int, if the value won't fit in an int, then the result of the conversion is implementation-defined. (Or, starting in C99, it can raise an implementation-defined signal, but I don't know of any compilers that actually do that.) What typically happens is that the high-order bits are discarded, but you shouldn't depend on that. (The rules are different for unsigned types; the result of converting a signed or unsigned integer to an unsigned type is well defined.)

If you need to safely assign a long value to an int object, you can check that it will fit before doing the assignment:

#include <limits.h> /* for INT_MIN, INT_MAX */

/* ... */

int i;
long li = /* whatever */

if (li >= INT_MIN && li <= INT_MAX) {
    i = li;
}
else {
    /* do something else? */
}

The details of "something else" are going to depend on what you want to do.

One correction: int and long are always distinct types, even if they happen to have the same size and representation. Arithmetic types are freely convertible, so this often doesn't make any difference, but for example int* and long* are distinct and incompatible types; you can't assign a long* to an int*, or vice versa, without an explicit (and potentially dangerous) cast.

And if you find yourself needing to convert a long value to int, the first thing you should do is reconsider your code's design. Sometimes such conversions are necessary, but more often they're a sign that the int to which you're assigning should have been defined as a long in the first place.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • First of all, thanks for you answer! I assume (from googling around a bit) that in order to use INT_MIN and INT_MAX I must include limits.h. Is there a way to figure out their sizes without including this header? Somehow just from the sizeof maybe? – Khaloymes Dec 01 '12 at 09:35
  • @Khaloymes: Yes, you need `#include `; I should have mentioned that. Note that these are bounds, not sizes. There are probably ways to compute them without using ``, but why bother? That's what `` is for, after all. – Keith Thompson Dec 01 '12 at 09:57
  • That implementation-defined signal is interesting. In particular I could find nothing in the standard that says it only applies to *implicit* conversions; if an implementation does go this route, the usual use of a cast as "don't warn me, I know what I'm doing" won't prevent the behaviour, as one might intuitively expect it to. – Alex Celeste Jun 10 '15 at 18:08
  • @Leushenko: Right, it applies to conversions, either explicit (casts) or implicit. – Keith Thompson Jun 10 '15 at 18:11
  • Is it proper to refer to discarding high-order bits as "truncating"? Or is that a term that only applies to float-to-integer conversions? – Lakey Mar 19 '19 at 16:28
  • 1
    @Lakey: I would have said no, but the C standard does refer to this as "truncating". (And as it happens, your question led me to what is probably an error in the standard. [N1570](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) 5.1.2.3p11, Example 2, says that `char c1, c2; /* ... */ c1 = c1 + c2;` will "truncate the sum", but if `char` is signed the result is implementation-defined. I suppose you could still call it a "truncation". In any case, examples are non-normative.) – Keith Thompson Mar 19 '19 at 19:37
3

A long can always represent all values of int. If the value at hand can be represented by the type of the variable you assign to, then the value is preserved.

If it can't be represented, then for signed destination type the result is formally unspecified, while for unsigned destination type it is specified as the original value modulo 2n, where n is the number of bits in the value representation (which is not necessarily all the bits in the destination).

In practice, on modern machines you get wrapping also for signed types.

That's because modern machines use two's complement form to represent signed integers, without any bits used to denote "invalid value" or such – i.e., all bits used for value representation.

With n bits value representation any integer value is x is mapped to x+K*2n with the integer constant K chosen such that the result is in the range where half of the possible values are negative.

Thus, for example, with 32-bit int the value -7 is represented as bitpattern number -7+232 = 232-7, so that if you display the number that the bitpattern stands for as unsigned integer, you get a pretty large number.

The reason that this is called two's complement is because it makes sense for the binary numeral system, the base two numeral system. For the binary numeral system there's also a ones' (note the placement of the apostrophe) complement. Similarly, for the decimal numberal system there's ten's complement and niners' complement. With 4 digit ten's complement representation you would represent -7 as 10000-7 = 9993. That's all, really.

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
  • 2
    C, starting with C99, requires signed integers to be represented as two's-complement, ones'-complement, or sign-and-magnitude. Almost all modern systems use two's-complement. – Keith Thompson Nov 30 '12 at 20:38
  • @KeithThompson How does C99 decides to require signed integers to be represented as two's-complement or to be represented as ones'-complement? I mean, is there a way I can know in advance how will my signed integer will be represented? Is there a way to change representation after a value was assigned? Also, please let me know if this comment is too wide and should be tranformed into a new question. – Khaloymes Dec 01 '12 at 09:45
  • @Khaloymes: Not sure what you mean. It's an explicit requirement in the standard; [see section 6.2.6.2](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf). You can't change the way your system operates on signed integers; that's generally determined by the hardware. You can do whatever bit manipulations you like of course, but if your system uses two's-complement, there's very little reason to use some other representation (unless you're dealing with some externally imposed data requirement). – Keith Thompson Dec 01 '12 at 10:05
  • @KeithThompson This is what I wanted to know, if I have a hand in deciding how will signed integer be represented or not. I conclude from your comment that's based on the hardware. – Khaloymes Dec 01 '12 at 10:18
  • @Khaloymes: It's up to the compiler implementer, who must document the choice. That decision is almost certain to be based on what the hardware supports (which, these days, is almost certain to be two's-complement). – Keith Thompson Dec 01 '12 at 10:22