14

It is very common that even in script where the developer have guarantees that the variable will never exceed one byte and sometimes two bytes; Many people decide to use int types for every possible variable used to represent numbers nay in the range of 0-1.

Why does it hurt so much to use char or short instead?

I think I heard someone saying int is "more standard" type of.. type. What does this mean. My question is does the data type int have any defined advantages over short (or other lesser data types), because of which advantages, people used to almost always resort to int?

Edenia
  • 2,312
  • 1
  • 16
  • 33
  • 1
    Using `short` makes sense when an large array of the object is possible, else, there else little advantage over `int`. – chux - Reinstate Monica May 28 '18 at 03:18
  • @chux True, but why choose `int`? I would choose `short` because it has little space advantage than `int`, which is chosen by people, because it looks better, as it appears? It requires typing of 2 keys less. (Edit: not to forget that ints can be of differnet size) – Edenia May 28 '18 at 03:19
  • 6
    `int` usually represents the natural integer data type of the CPU, and as such operations involving `int` tend to be more optimized by the CPU and the compiler. – Remy Lebeau May 28 '18 at 03:22
  • 2
    The native size of int is typically chosen to be a match for the best fit into a processor's registers. A `short` does not occupy a full register, so it's less efficient. Use the appropriate size for what you''re doing. If you have a specific requirement to use something other than an integer, do so. Otherwise, trust the compiler to do it's job. – Ken White May 28 '18 at 03:22
  • So `int` has performance advantage over `short`, I assume on most processors and certain compilation mode. – Edenia May 28 '18 at 03:23
  • 4
    It isn't bad, and it doesn't hurt, much, it's just pointless unless you have large numbers of them adjacent, e,g. in an array, or some crazy object with hundreds of adjacent `short` members. Or if you are specifically doing 16-bit arithmetic for some reason, e.g. a radix conversion. There are certainly situations where it's the only correct choice. – user207421 May 28 '18 at 03:26
  • @Edenia: What do you mean by "defined advantages?" – Nicol Bolas May 28 '18 at 03:41
  • @NicolBolas Advantages that some reliable source (compiler manual, language creator, John Skeet, I don't know) ever described or mentioned.. – Edenia May 28 '18 at 03:43
  • dupe of [When to use `short` over `int`?](https://stackoverflow.com/questions/24371077/when-to-use-short-over-int) and probably more – underscore_d May 28 '18 at 06:49
  • Don't use either. Use `int_fast8_t` and `int_fast16_t` if you only need to work with smaller numbers. However, these (like char and short) are subject to the stupidly dangerous implicit integer promotion rules of C and C++, which is the actual main reason to avoid small integer types if possible. – Lundin May 28 '18 at 08:11
  • See the question [Implicit type promotion rules](https://stackoverflow.com/questions/46073295/implicit-type-promotion-rules) for some examples when these types shoot you in the foot. – Lundin May 28 '18 at 08:12
  • @Lundin What makes integer promotions dangerous? – Petr Skocik May 28 '18 at 12:25
  • 1
    @PSkocik The link I posted above gave some examples to illustrate. – Lundin May 28 '18 at 13:14
  • @Lundin Thanks. – Petr Skocik May 28 '18 at 13:33

3 Answers3

20

As a general rule, most arithmetic in C is performed using type int (that is, plain int, not short or long). This is because (a) the definition of C says so, which is related to the fact that (b) that's the way many processors (at least, the ones C's designers had in mind) prefer to work.

So if you try to "save space" by using short ints instead, and you write something like

short a = 1, b = 2;
short c = a + b;

the compiler may have to emit code to, in effect, convert a from short to int, convert b from short to int, do the addition, and convert the sum back to short. You may have saved a little bit of space on the storage for a, b, and c, but your code may end up being bigger (and slower).

If you instead write

int a = 1, b = 2;
int c = a + b;

you might spend a little more storage space on a, b, and c, but the code might be smaller and quicker.

This is somewhat of an oversimplified argument, but it's behind your observation that usage of type short is rare, and plain int is generally recommended. Basically, since it's the machine's "natural" size, it's presumed to be the most straightforward type to do arithmetic in, without extra conversions to and from less-natural types. It's sort of a "When in Rome, do as the Romans do" argument, but it generally does make using plain int advantageous.

If you have lots of not-so-large integers to store, on the other hand (a large array of them, or a large array of structures containing not-so-large integers), the storage savings for the data might be large, and worth it as traded off against the (relatively smaller) increase in the code size, and the potential speed increase.

See also this previous SO question and this C FAQ list entry.


Addendum: like any optimization problem, if you really care about data space usage, code space usage, and code speed, you'll want to perform careful measurements using your exact machine and processor. Your processor might not end up requiring any "extra conversion instructions" to convert to/from the smaller types, after all, so using them might not be so much of a disadvantage. But at the same time you can probably confirm that, for isolated variables, using them might not yield any measurable advantage, either.


Addendum 2. Here's a data point. I experimented with the code

extern short a, b, c;

void f()
{
    c = a + b;
}

I compiled with two compilers, gcc and clang (compiling for an Intel processor on a Mac). I then changed short to int and compiled again. The int-using code was 7 bytes smaller under gcc, and 10 bytes smaller under clang. Inspection of the assembly language output suggests that the difference was in truncating the result so as to store it in c; fetching short as opposed to int doesn't seem to change the instruction count.

However, I then tried calling the two different versions, and discovered that it made virtually no difference in the run time, even after 10000000000 calls. So the "using short might make the code bigger" part of the answer is confirmed, but maybe not "and also make it slower".

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • It makes sense, I knew the correlation between space:performance. Caching is the most generic way to increase performance, and parsing to decrease space consumption. – Edenia May 28 '18 at 03:34
  • @Edenia Type `int`'s defined advantage is that it's presumed to be the most "natural" and efficient type to do arithmetic in. Any other, less "natural" type might require extra conversions or less-efficient arithmetic. I think you knew that, so if you're asking whether there are any *other*, less obvious advantages, I guess the answer is "No, just the 'naturalness' one." – Steve Summit May 28 '18 at 03:46
  • Yes, well, I am surprised why not many people wonder why everyone uses ints in books, tutorials, open source projects while there are other data types meant for lower values. For higher abstraction languages like Object Pascal, this is even more obvious. – Edenia May 28 '18 at 03:49
  • @Edenia because the range of values that can be held in the integer types is not enforced at runtime. So there is no advantage in the abstraction. – Richard Critten May 28 '18 at 08:35
  • Most of the so-called extra instructions mentioned here are native to processors, and no different in size or number from the code that would be emitted for `int`. For example, loading a 32-bit register from a 16-bit location with sign extension, or the reverse with high-order truncation. – user207421 May 28 '18 at 08:37
  • @EJP Right. That's why I said "This is somewhat of an oversimplified argument", and why I inserted "presumed to be" in "it's presumed to be the most straightforward type to do arithmetic in", and why I inserted "generally" in "it generally does make using plain `int` advantageous". But I should add a few more words along those lines; thanks for the reminder. – Steve Summit May 28 '18 at 11:15
  • 2
    Eh? You stated, without qualification, that 'the compller has to emit code' and 'your code is bigger and slower'. I haven't dealt with this seriously for thirty years, but it wasn't true then, and it isn't true now. – user207421 May 28 '18 at 11:28
  • 1
    @EJP Well, I've been doing this for at least 30 years, too, although I haven't followed my own advice, in that I've never performed any careful measurements to confirm the effect. (I've basically just been parroting the party line, which is one reason I said "When in Rome, do as the Romans do.") But I performed a few measurements just now, and indeed, using `short` *does* make the code bigger. I'll post the results shortly. – Steve Summit May 28 '18 at 11:58
  • I think, something is wrong with my version of the C standard .. `If both operands have the same type, then no further conversion is needed.`. Using shorts might allow a CPU to use SIMD instructions, like doing 2 16bit ops within an 32bit register in 1 instruction. But the compiler can only decide on this, if you are using the proper needed datatype. It can not deduce this, if you use int everywhere, because YOU think, it's faster on a certain architecture. – kesselhaus Aug 24 '21 at 04:20
5

There are several issues here.

  • First of all the char type is entirely unsuitable for holding integer values. It should only be used for holding characters. This is because it has implementation-defined signedness, char is actually distinct type separate from signed char and unsigned char. See Is char signed or unsigned by default?.

  • The main reason why the small integer types such as char and short should be avoided if possible, is however implicit type promotion. These types are subject to integer promotion, which in turn can lead to dangerous things like silent change of signedness. See Implicit type promotion rules for details.

    For this reason, some coding standards actually outright ban the use of smaller integer types. Though for such a rule to be feasible, you need a 32 bit CPU or larger. So it is not really a good universal solution if various microcontrollers are to be taken in account.

    Also note that micro-managing memory in this manner is mostly just relevant in embedded systems programming. If you are programming PC programs, using smaller types to save memory is likely a "pre-mature optimization".

  • The default "primitive data types" of C, including char, short, int, are quite non-portable overall. They may change in size when the code is ported, which in turn gives them an indeterministic behavior. In addition, C allows all manner of obscure and exotic signedness formats for these types, such as one's complement, sign & magnitude, padding bits etc.

    Rugged, portable, quality code doesn't use these types at all, but instead the types of stdint.h. As a bonus, that library only allows sane industry standard two's complement.

  • Using the smaller integer types to save space is not a good idea, for all the above mentioned reasons. Again, stdint.h is preferable. If you need an universal type which portably saves memory, unless saving memory means reducing execution speed, use the int_fast8_t and similar. These will be 8 bits unless using a larger type means faster execution.

Lundin
  • 195,001
  • 40
  • 254
  • 396
4

I was skeptical about the claim that short-based code should be slower and bigger in any significant way (assuming local variables here, no disputes about large arrays where shorts definitely do pay off if appropriate), so I tried to benchark it on my Intel(R) Core(TM) i5 CPU M 430 @ 2.27GHz

I used (long.c):

long long_f(long A, long B)
{
    //made up func w/ a couple of integer ops 
    //to offset func-call overhead
    long r=0;
    for(long i=0;i<10;i++){
        A=3*A*A;
        B=4*B*B*B;
        r=A+B;
    }
    return r;
}

in a long, int, and short-based version (%s/long/TYPE/g), built the program with gcc and clang in -O3 and -Os and measured sizes and runtimes for 100mil invocations of each of these functions.

f.h:

#pragma once
int int_f(int A, int B);
short short_f(short A, short B);
long long_f(long A, long B);

main.c:

#include "f.h"
#include <stdlib.h>
#include <stdio.h>
#define CNT 100000000
int main(int C, char **V)
{
    int choose = atoi(V[1]?:"0");
    switch(choose){
    case 0:
        puts("short");
        for(int i=0; i<CNT;i++)
            short_f(1,2);
        break;
    case 1:
        puts("int");
        for(int i=0; i<CNT;i++)
            int_f(1,2);
        break;
    default:
        puts("long");
        for(int i=0; i<CNT;i++)
            long_f(1,2);
    }
}

build:

#!/bin/sh -eu
time(){ command time -o /dev/stdout "$@"; }
for cc in gcc clang; do
    $cc -Os short.c -c
    $cc -Os int.c -c
    $cc -Os long.c -c
    size short.o int.o long.o
    $cc main.c short.o int.o long.o

    echo $cc -Os
    time ./a.out 2
    time ./a.out 1
    time ./a.out 0

    $cc -O3 short.c -c
    $cc -O3 int.c -c
    $cc -O3 long.c -c
    size short.o int.o long.o
    $cc main.c short.o int.o long.o
    echo $cc -O3
    time ./a.out 2
    time ./a.out 1
    time ./a.out 0
done

I did it twice, the and the results appear to be stable.

   text    data     bss     dec     hex filename
     79       0       0      79      4f short.o
     80       0       0      80      50 int.o
     87       0       0      87      57 long.o
gcc -Os
long
3.85user 0.00system 0:03.85elapsed 99%CPU (0avgtext+0avgdata 1272maxresident)k
0inputs+0outputs (0major+73minor)pagefaults 0swaps
int
4.78user 0.00system 0:04.78elapsed 99%CPU (0avgtext+0avgdata 1220maxresident)k
0inputs+0outputs (0major+74minor)pagefaults 0swaps
short
3.36user 0.00system 0:03.36elapsed 99%CPU (0avgtext+0avgdata 1328maxresident)k
0inputs+0outputs (0major+74minor)pagefaults 0swaps
   text    data     bss     dec     hex filename
    137       0       0     137      89 short.o
    109       0       0     109      6d int.o
    292       0       0     292     124 long.o
gcc -O3
long
3.90user 0.00system 0:03.90elapsed 99%CPU (0avgtext+0avgdata 1220maxresident)k
0inputs+0outputs (0major+74minor)pagefaults 0swaps
int
1.22user 0.00system 0:01.22elapsed 99%CPU (0avgtext+0avgdata 1260maxresident)k
0inputs+0outputs (0major+73minor)pagefaults 0swaps
short
1.62user 0.00system 0:01.62elapsed 99%CPU (0avgtext+0avgdata 1228maxresident)k
0inputs+0outputs (0major+73minor)pagefaults 0swaps
   text    data     bss     dec     hex filename
     83       0       0      83      53 short.o
     79       0       0      79      4f int.o
     88       0       0      88      58 long.o
clang -Os
long
3.33user 0.00system 0:03.33elapsed 99%CPU (0avgtext+0avgdata 1316maxresident)k
0inputs+0outputs (0major+71minor)pagefaults 0swaps
int
3.02user 0.00system 0:03.03elapsed 99%CPU (0avgtext+0avgdata 1316maxresident)k
0inputs+0outputs (0major+71minor)pagefaults 0swaps
short
5.27user 0.00system 0:05.28elapsed 99%CPU (0avgtext+0avgdata 1236maxresident)k
0inputs+0outputs (0major+69minor)pagefaults 0swaps
   text    data     bss     dec     hex filename
    110       0       0     110      6e short.o
    219       0       0     219      db int.o
    279       0       0     279     117 long.o
clang -O3
long
3.57user 0.00system 0:03.57elapsed 99%CPU (0avgtext+0avgdata 1228maxresident)k
0inputs+0outputs (0major+69minor)pagefaults 0swaps
int
2.86user 0.00system 0:02.87elapsed 99%CPU (0avgtext+0avgdata 1228maxresident)k
0inputs+0outputs (0major+68minor)pagefaults 0swaps
short
1.38user 0.00system 0:01.38elapsed 99%CPU (0avgtext+0avgdata 1204maxresident)k
0inputs+0outputs (0major+70minor)pagefaults 0swaps

The results are fairly close and yet they relatively vary quite widely with different compilers and compiler settings.

My conclusion is that choosing between int and shorts in a function body or signature (arrays are a different issue) because one should perform better than the other or generate denser code is mostly futile (at least in code that isn't fixed to a specific compiler with specific settings). Either is fast, so I'd choose whichever type fits the semantics of my program better or communicates my API better (If I'm expecting a short positive value, might as well use a uchar or ushort in the signature.)

C programmers are predisposed to use ints because C has favored them historically (integer literals tend to be ints, promotions tend to make ints, there used to be implicit int rules for declarations and undeclared functions, etc.) and ints are supposed to be a good fit for the architecture, but at the end of the day, dense, performant machine code with a readable, maintainable source is what matters and if your theory for doing something in the source code doesn't demonstrably contribute towards at least one of these goals, I think it's a bad theory.

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • If you compare these types on a microcontroller system, the difference will be very obvious. On the average 8-bitter, `unsigned char` might give a single instruction, while `unsigned short` gives somewhere around 5 to 10 instructions, and `unsigned long` some 100 instructions. – Lundin May 28 '18 at 13:38
  • @Lundin sounds like all the more reason to use the best-fit type instead of `int` – Petr Skocik May 28 '18 at 13:41
  • 1
    Also `enums` are implicitly prompted to `int`s – Edenia May 28 '18 at 14:25