0

Every modern programming language I've encountered has well-defined data types. Java, for example, has an int that is exactly 32 bits and a long that is exactly 64 bits. This is not implementation specific but built into the spec itself. C, on the other hand, has a short that is at least 16 bits, an int that is at least 16 bits, and a long that is at least 32 bits. Because it is defined as at least, though, they can be much larger, which results in serious portability issues.

I have never understood this. I have always chalked it up to being the result of differing hardware standards in the 1970's, but this doesn't actually make sense to me. Even if there were a lot of computers that couldn't handle 64 bit data types, that still doesn't excuse not defining a short as necessarily 16 bits and an int as necessarily 32 bits. From the standards we already see 32 bits as a minimum requirement for any hardware.

This proliferation of standards has resulted in advice like the following, taken from the article How to C (as of 2016):

If you find yourself typing char or int or short or long or unsigned into new code, you’re doing it wrong.

For modern programs, you should #include <stdint.h>then use standard types.

For more details, see the stdint.h specification.

The common standard types are:

  • int8_t, int16_t, int32_t, int64_t — signed integers
  • uint8_t, uint16_t, uint32_t, uint64_t — unsigned integers
  • float — standard 32-bit floating point
  • double - standard 64-bit floating point

Notice we don’t have char anymore. char is actually misnamed and misused in C.

Developers routinely abuse char to mean “byte” even when they are doing unsigned byte manipulations. It’s much cleaner to use uint8_t to mean single a unsigned-byte/octet-value and uint8_t * to mean sequence-of-unsigned-byte/octet-values.

I've also noticed quite a bit of advice recommending the use of size_t as the standard go to int replacement, as in Modern C by Jens Gustedt which uses a size_t as the control variable in a for loop in the very first example of the book.

My question is two fold:

1. Why did K&R 2 not define data types more definitely in 1978?

2. What, in 2018, are we supposed to do with our data type choices, or is it all just style and convention and highly opinionated?

mas
  • 1,155
  • 1
  • 11
  • 28
  • 4
    One of the original C computers had 36 bit ints with 9 bit chars. Where would **you** use C now if it was defined as such back then? – Antti Haapala -- Слава Україні Oct 26 '18 at 17:22
  • 1
    Look at Mr. Fancy Computer Owner with 32-bit `int` in the 1970s. La de da. No, 32-bit was not a hardware requirement for implementing C back then. – Eric Postpischil Oct 26 '18 at 17:30
  • In other words the answer _is_ hardware limitations of 1978. I couldn't find the dupe original. Thank you for the pointer. – mas Oct 26 '18 at 17:33
  • 1
    I didn't downvote so here's a counter. Jens is a regular answerer to C questions on Stack Overflow. Matt's writing contains some questionable choices on the other hand. You should use `size_t` as the loop variable for loops that index arrays (most do!) – Antti Haapala -- Слава Україні Oct 26 '18 at 17:35
  • @AnttiHaapala Thanks; I've always wondered why SO culture penalizes duplicates that are uniquely worded. An increase in SEO optimization would help people find their answers better. Question: that dupe original just answers the first part of my question, and you only give limited advice; do you know of a resource that answers #2 more thoroughly? – mas Oct 26 '18 at 17:38
  • @malan there are *bad* duplicates. Like most of the C duplicates now are bad. They're not real duplicates in that there is no [mcve]; there are 1000 lines of code with error on one; that error being answered in another. – Antti Haapala -- Слава Україні Oct 26 '18 at 17:40
  • @malan as for "what should we do", that is off-topic for Stack Overflow as primarily opinionated, but I remember some... – Antti Haapala -- Слава Україні Oct 26 '18 at 17:41
  • Using `size_t` for a loop variable is correct if the loop is over container indices, since `size_t` is an unsigned type wide enough to contain the size of any object. IOW, it must be OK for an index. That does not make it a "goto int replacement". `int`s are used for many things other than container indices. – rici Oct 26 '18 at 18:05
  • but again, *plurality, if not majority, of `for` loops* with a counter is iterating indices... – Antti Haapala -- Слава Україні Oct 26 '18 at 18:18
  • 1
    The early '70s really were the Wild West where computer hardware was concerned. Not only did different architectures have different word sizes, they may also have had padding bits within integral types, and they may have used something other than 2's complement for signed integer representation. That's why the C standard guarantees that a type can represent a minimum *range of values*, rather than contain a minimum number of bits or bytes. As long as you respect those value limits and don't make any assumptions about how values are represented, your code will be *highly* portable. – John Bode Oct 26 '18 at 18:52
  • As for what to use - if you care about *values* rather than representation, and you want your code to be portable to anything with a C compiler, use the traditional types (`int`, `short`, `long`, etc.). If you care about *representation* (i.e., for bit twiddling) and you're not so concerned about portability (i.e., you only care about modern desktop and server systems), then use the `stdint` types (`int16_t`, `uint8_t`, etc.). – John Bode Oct 26 '18 at 18:59

0 Answers0