10

When dynamically allocating chars, I've always done it like this:

char *pCh = malloc(NUM_CHARS * sizeof(char));

I've recently been told, however, that using sizeof(char) is redundant and unnecessary because, "by definition, the size of a char is one byte," so I should/could write the above line like this:

char *pCh = malloc(NUM_CHARS);

My understanding is the size of a char depends on the native character set that is being used on the target computer. For example, if the native character set is ASCII, a char is one byte (8 bits), and if the native character set is UNICODE a char will necessarily require more bytes (> 8 bits).

To provide maximum portability, wouldn't it be necessary to use sizeof(char), as malloc simply allocates 8-bit bytes? Am I misunderstanding malloc and sizeof(char)?

Fiddling Bits
  • 8,712
  • 3
  • 28
  • 46
  • +1 for leaving out the unnecessary `(char*)` cast on the right hand side – Bathsheba Dec 19 '13 at 14:23
  • 3
    I'd do `char * pCh = malloc(NUM_CHARS * sizeof(*pCh));` and turn to other issues. – alk Dec 19 '13 at 14:24
  • s/right/left/, right @Bathsheba? – unwind Dec 19 '13 at 14:36
  • 1
    "malloc simply allocates 8-bit bytes" **No**. While it's true that malloc allocates bytes, C defines a byte to be however big a char is. So malloc always allocates in units of sizeof(char) which is always 1, however many bits that is. malloc(N) will allocate N*CHAR_BIT bits. – nos Dec 19 '13 at 14:54
  • @nos Good comment… should be an answer. :-D – Fiddling Bits Dec 19 '13 at 15:18
  • @nos it is interesting the standard does not explicitly say this unless I am missing it, although the C99 rationale does. – Shafik Yaghmour Dec 19 '13 at 15:28
  • @ShafikYaghmour Well, it defines what malloc does, and it defines what a char is and what a byte is, there would not be any need to state it explicitly that malloc allocates N*CHAR_BIT bits – nos Dec 19 '13 at 17:09

6 Answers6

14

Yes, it is redundant since the language standard specifies that sizeof (char) is 1. This is because that is the unit in which things are measured, so of course the size of the unit itself must be 1.

Life becomes strange with units defined in terms of themselves, that simply doesn't make any sense. Many people seem to "want" to assume that "there are 8-bit bytes, and sizeof tells me how many such there are in a particular value". That is wrong, that's simply not how it works. It's true that there can be platforms with larger characters than 8 bits, that's why we have CHAR_BIT.

Typically you always "know" when you're allocating characters anyway, but if you really want to include sizeof, you should really consider making it use the pointer, instead:

char *pCh = malloc(NUM_CHARS * sizeof *pCh);

This "locks" the unit size of the thing being allocated the pointer that is used to store the result of the allocation. These two types should match, if you ever see code like this:

int *numbers = malloc(42 * sizeof (float));

that is a huge warning signal; by using the pointer from the left-hand side in the sizeof you make that type of error impossible which I consider a big win:

int *numbers = malloc(42 * sizeof *numbers);

Also, it's likely that if you change the name of the pointer, the malloc() won't compile which it would if you had the name of the (wrong) basic type in there. There is a slight risk that if you forget the asterisk (and write sizeof numbers instead of sizeof *numbers) you'll not get what you want. In practice (for me) this seems to never happen, since the asterisk is pretty well established as part of this pattern, to me.

Also, this usage relies on (and emphasizes) the fact that sizeof is not a function, since no ()s are needed around the pointer de-referencing expression. This is a nice bonus, since many people seem to want to deny this. :)

I find this pattern highly satisfying and recommend it to everyone.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • You should have answered earlier, I would have given you the correct answer. – Fiddling Bits Dec 19 '13 at 14:44
  • @BitFiddlingCodeMonkey Aawww. Thanks. :) I do believe you can move the accepted-status, if you like. [See this meta question](http://meta.stackexchange.com/questions/62252/is-it-poor-form-to-switch-accepted-answers). – unwind Dec 19 '13 at 14:46
5

The C99 draft standard section 6.5.3.4 The sizeof operator paragraph 3 states:

When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1. [...]

In the C11 draft standard it is paragraph 4 but the wording is the same. So NUM_CHARS * sizeof(char) should be equivalent to NUM_CHARS.

We can see from the definition of byte in 3.6 that it is a:

addressable unit of data storage large enough to hold any member of the basic character set of the execution environment

and Note 2 says:

A byte is composed of a contiguous sequence of bits, the number of which is implementation defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
4

The C specification states that sizeof(char) is 1, so as long as you are dealing with conforming implementations of C it is redundant.

The size unit used by mallocis the same. malloc(120) allocates space for 120 char.

A char must be at least 8 bits, but may be larger.

Klas Lindbäck
  • 33,105
  • 5
  • 57
  • 82
  • So on systems that have 16-bit `char`s, allocating memory in multiples of `8-bits` is not possible? – Fiddling Bits Dec 19 '13 at 14:26
  • @BitFiddlingCodeMonkey: exactly. The point is that a `char` (=byte) is defined as the smallest addressable data type (as far as is bigger than 8 bit), so having finer granularity doesn't make sense. – Matteo Italia Dec 19 '13 at 14:31
  • @BitFiddlingCodeMonkey: You cannot ask for 24 bits on such a system. `malloc(1)` will usually allocate 4 bytes anyway due to memory alignment, so I don't see the problem. – Klas Lindbäck Dec 19 '13 at 14:31
  • @KlasLindbäck My only concern is on systems that have limited memory. It seems wasteful to use 16-bits when only 8 is required. Perhaps, in that case, you should try to create a `struct` with bitfields (`:8`) to "conserve" memory. – Fiddling Bits Dec 19 '13 at 14:37
  • 2
    @CodeMonkey On systems with limited memory I wouldn't expect 16 bit chars. – Klas Lindbäck Dec 19 '13 at 14:53
  • 1
    Also, optimizing your code for the unlikely chance of a port to hypothetical platforms with little memory but big byte sizes is something that takes premature optimization to new peaks of madness =). Write correct, portable code and worry about platform-specific optimizations just for platforms where it is actually likely to run. – Matteo Italia Dec 19 '13 at 15:26
3

sizeof(char) will always return 1 so it doesn't matter if you use it or nit, it will not change. You may be confusing this with UNICODE wide characters, which have two bytes, but they have a different type wchar_t so you should use sizeof in that case.

If you are working on a system where a byte is defined to have 16 bits, then sizeof(char) would still return 1 as this is what the underlying architecture would allocate. 1 Byte with 16 bits.

Devolus
  • 21,661
  • 13
  • 66
  • 113
  • So if 1 byte is 16-bits on a system, does `malloc` always return multiples of `16-bits`, that is, you cannot dynamically allocate a multiple of `8-bits`? – Fiddling Bits Dec 19 '13 at 14:25
  • 1
    Yes, if that is the specification of the machine, it is so. The compiler just reflects that design. On such a machine you can not address less then 16 bits. So if you do `malloc(2)` you would get a pointer pointing to two bytes, but consisting of 32 bits. – Devolus Dec 19 '13 at 14:26
3

Allocation sizes are always measured in units of char, which has size 1 by definition. If you are on a 9-bit machine, malloc understands its argument as a number of 9-bit bytes.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • Are you using `9-bits` as a hypothetical example? I've never heard of such a thing. – Fiddling Bits Dec 19 '13 at 14:31
  • 2
    @BitFiddlingCodeMonkey: IIRC some mainframes used 9-bit bytes - probably due to the 36 bit words. Nowadays the bizarre bit sizes are found normally in DSPs, which tend to have 12 to 16 bits per byte. See [here](http://stackoverflow.com/questions/5516044/system-where-1-byte-8-bit) for some real world examples. – Matteo Italia Dec 19 '13 at 14:38
2

sizeof(char) is always 1, but not because char is always one byte (it needn't be), but rather because the sizeof operator returns the object/type size in units of char.

Simon Richter
  • 28,572
  • 1
  • 42
  • 64
  • `char` typically *is* the "platform byte" (=the smallest addressable data type), the point is that not on all platforms bytes are octets. – Matteo Italia Dec 19 '13 at 14:26