43

I just started learning C and am rather confused over declaring characters using int and char.

I am well aware that any characters are made up of integers in the sense that the "integers" of characters are the characters' respective ASCII decimals.

That said, I learned that it's perfectly possible to declare a character using int without using the ASCII decimals. Eg. declaring variable test as a character 'X' can be written as:

char test = 'X';

and

int test = 'X';

And for both declaration of character, the conversion characters are %c (even though test is defined as int).

Therefore, my question is/are the difference(s) between declaring character variables using char and int and when to use int to declare a character variable?

Rakete1111
  • 47,013
  • 16
  • 123
  • 162
xhxh96
  • 649
  • 1
  • 7
  • 14
  • 2
    Note that most of the library functions that take a byte sized argument, or return one, use type `int`, for example `getchar()` and `toupper()` and `memset()`. Note too that `'X'` defines an `int`, as revealed by printing `sizeof 'X'`. – Weather Vane May 15 '16 at 17:27
  • 1
    Similar question: [Practical difference between int and char](http://stackoverflow.com/questions/15869931/practical-difference-between-int-and-char) – MicroVirus May 15 '16 at 17:41

5 Answers5

60

The difference is the size in byte of the variable, and from there the different values the variable can hold.

A char is required to accept all values between 0 and 127 (included). So in common environments it occupies exactly one byte (8 bits). It is unspecified by the standard whether it is signed (-128 - 127) or unsigned (0 - 255).

An int is required to be at least a 16 bits signed word, and to accept all values between -32767 and 32767. That means that an int can accept all values from a char, be the latter signed or unsigned.

If you want to store only characters in a variable, you should declare it as char. Using an int would just waste memory, and could mislead a future reader. One common exception to that rule is when you want to process a wider value for special conditions. For example the function fgetc from the standard library is declared as returning int:

int fgetc(FILE *fd);

because the special value EOF (for End Of File) is defined as the int value -1 (all bits to one in a 2-complement system) that means more than the size of a char. That way no char (only 8 bits on a common system) can be equal to the EOF constant. If the function was declared to return a simple char, nothing could distinguish the EOF value from the (valid) char 0xFF.

That's the reason why the following code is bad and should never be used:

char c;    // a terrible memory saving...
...
while ((c = fgetc(stdin)) != EOF) {   // NEVER WRITE THAT!!!
    ...
}

Inside the loop, a char would be enough, but for the test not to succeed when reading character 0xFF, the variable needs to be an int.

bunyaCloven
  • 313
  • 1
  • 4
  • 14
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • I'm not sure the memory argument makes much sense for a single variable. In my opinion, expressing intent is more important. – svick May 15 '16 at 21:43
  • @svick: as a now old dinosaur, I once used a system with 2 ko ROM and 1 ko RAM... In that case any single byte that could be saved should be! But with now common system you are right expressing intent is probably more important. Hope it's more clear now – Serge Ballesta May 15 '16 at 22:01
  • 1
    @SergeBallesta, one need not be a dinosaur to work with systems like that ... the smallest chip targeted by the compiler I work on has less than 1 ko of ROM and less than 50 o of RAM. Embedded is still a large market for small things. – mlp May 16 '16 at 00:11
  • The name of the `char` type has become dangerously misleading. "If you want to store only characters in a variable, you should declare it as `char`." is the wrong advice unless you specifically state "only ASCII characters". All modern OSes now use variable length characters consisting of 1 to 5 bytes or 1 to 2 words. There are also many legacy situations using single byte encodings/codepages and several popular Asian encodings with 16 bit or multibyte characters. – hippietrail May 16 '16 at 02:56
  • 2
    @hippietrail: None of the three popular UTF encodings use five bytes to encode any kind of character (although UTF-8 could be so extended, the standard does not provide for it), and most of the legacy encodings that are or were popular at various times are either fixed-width or at most 32-bit variable-width. I hesitate to say that *nobody* uses 5 bytes per character, but it's certainly not a common way of doing things. – Kevin May 16 '16 at 05:17
  • 1
    Minor niggle: As we are talking C here, a char occupies exactly one byte *by definition* (of the term *byte*). A C byte is not necessarily 8 bits - on a DSP it's quite likely to be 32 or 64 bits. – Martin Bonner supports Monica May 16 '16 at 10:10
  • @Kevin: My apologies, you are [correct about the maximum number of bytes for a Unicode codepoint in UTF-8](http://stackoverflow.com/a/9533324/527702). Originally the maximum was actually 6 bytes but they reduced it to 4 bytes years ago and nothing was ever put in that >4 byte space. – hippietrail May 16 '16 at 16:22
10

The char type has multiple roles.

The first is that it is simply part of the chain of integer types, char, short, int, long, etc., so it's just another container for numbers.

The second is that its underlying storage is the smallest unit, and all other objects have a size that is a multiple of the size of char (sizeof returns a number that is in units of char, so sizeof char == 1).

The third is that it plays the role of a character in a string, certainly historically. When seen like this, the value of a char maps to a specified character, for instance via the ASCII encoding, but it can also be used with multi-byte encodings (one or more chars together map to one character).

MicroVirus
  • 5,324
  • 2
  • 28
  • 53
  • 4
    Note that it is `signed char` that fits in the first list of signed types. Plain `char` may be an unsigned type, in which case it does not fit in that list (and machines using EBCDIC are constrained to use either an unsigned plain `char` or `CHAR_BIT > 8` because of the requirement for all characters in the basic execution character set to be positive, and the EBCIDC code for `0` is 240, for example). The standard actually says (§6.2.5): _There are five_ standard signed integer types, _designated as `signed char`, `short int`, `int`, `long int`, and `long long int`._ – Jonathan Leffler May 15 '16 at 18:22
4

Size of an int is 4 bytes on most architectures, while the size of a char is 1 byte.

Viktor Simkó
  • 2,607
  • 16
  • 22
  • 2
    Note that `sizeof(char)` is always 1 — even when `CHAR_BIT == 16` or more . The standard mandates this: _§6.5.3.4 The `sizeof` and `_Alignof` operators: … ¶4 When `sizeof` is applied to an operand that has type `char`, `unsigned char`, or `signed char`, (or a qualified version thereof) the result is 1._ – Jonathan Leffler May 15 '16 at 17:48
  • @JonathanLeffler Thank you! – Viktor Simkó May 15 '16 at 17:50
  • 1
    The `sizeof` a `char` is always `1`, but the size of a `char` needn't be one byte. – MicroVirus May 15 '16 at 17:50
  • 2
    @MicroVirus: Definition 3.6 of the standard says: _**byte** addressable unit of data storage large enough to hold any member of the basic character set of the execution environment. NOTE 1 It is possible to express the address of each individual byte of an object uniquely. NOTE 2 A byte is composed of a contiguous sequence of bits, the number of which is implementation defined. The least significant bit is called the _low-order bit_; the most significant bit is called the _high-order bit_. […continued…] – Jonathan Leffler May 15 '16 at 17:59
  • 1
    _[…continuation…]_ §6.2.5 Types says: _An object declared as type `char` is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a `char` object, its value is guaranteed to be nonnegative._ I've not found words that specify 'a `char` must be the smallest addressable unit' (though it usually is), but if `CHAR_BIT == 16`, the definition of 'byte' in that implementation is a 16-bit unit of memory. You'd need another name — octet is used in communications parlance — for an 8-bit quantity. – Jonathan Leffler May 15 '16 at 18:01
  • @JonathanLeffler The byte has become so synonymous with 8 bits that that's the terminology I used. – MicroVirus May 15 '16 at 18:06
  • @JonathanLeffler I don't see how it follows from these definitions that if `char` is 16 bits that then the underlying system uses `16 bits`; as far as I can tell, these definitions allow the freedom to use `char` 16 bits and the underlying byte 8 bits ... it would be a bit weird, but legal? EDIT: From what I can tell, it does seem to be intended for `char` to be synonymous with byte, – MicroVirus May 15 '16 at 18:11
  • 2
    There is quite a lot of historical baggage surrounding all this. In the days when the same code had to run on hardware where the smallest addressable unit of memory was 60 bits, containing ten characters with 6-bit codes (upper-case only!), and also on hardware with 64-bit-addressable memory containing eight 8-bit character codes, the need for some of these subtleties was more obvious than it is now when (almost) all hardware is byte-addressable and (almost) all character handling is (or should be!) Unicode-compatible. – alephzero May 15 '16 at 20:35
  • 1
    @MicroVirus Important to keep in mind is that if the underlying hardware addresses memory in 8-bit "bytes", but the C implementation only provides access through `char` with `CHAR_BIT == 16`, then this is indistinguishable (within a valid C program) from an implementation where the hardware cannot address memory in 8-bit "bytes". From the perspective of the C standard, what you call "underlying byte" is irrelevant. It is allowed to be smaller or larger than a C `char`, although it will typically be the same size for obvious reasons. –  May 15 '16 at 20:42
3

Usually you should declare characters as char and use int for integers being capable of holding bigger values. On most systems a char occupies a byte which is 8 bits. Depending on your system this char might be signed or unsigned by default, as such it will be able to hold values between 0-255 or -128-127.

An int might be 32 bits long, but if you really want exactly 32 bits for your integer you should declare it as int32_t or uint32_t instead.

Henrik Carlqvist
  • 1,138
  • 5
  • 6
2

I think there's no difference, but you're allocating extra memory you're not going to use. You can also do const long a = 1;, but it will be more suitable to use const char a = 1; instead.

cdonts
  • 9,304
  • 4
  • 46
  • 72