64

I've always assumed:

  1. that a char is represented by a byte,
  2. that a byte can always be counted upon to have 8 bits,
  3. that sizeof (char) is always 1,
  4. and that the maximum theoretical amount of memory I can allocate (counted in chars) is the number of bytes of RAM (+ swap space).

But now that I've read the Wikipedia entry on the byte I'm not so sure anymore.

Which one(s) of my assumptions is wrong? Which one(s) is dangerous?

Carl Norum
  • 219,201
  • 40
  • 422
  • 469
lindelof
  • 34,556
  • 31
  • 99
  • 140
  • http://stackoverflow.com/questions/4266771/char-size-confusion http://stackoverflow.com/questions/881894/is-char-guaranteed-to-be-exactly-8-bit-long-in-c – Josh Lee Mar 15 '12 at 20:16
  • 2
    @MrLister: What do other languages have to do with it? – Ed S. Mar 15 '12 at 20:50
  • Those have `char` types, and the question was so desparate about always-always-always, that I felt the need to remark about situations where `sizeof char` is not 1 (even if it's outside of C. Note that neither the question title nor the question text mentions C). – Mr Lister Mar 15 '12 at 21:44
  • 4
    @MrLister: That's why we have tags. –  Mar 16 '12 at 12:48
  • [Is CHAR_BIT ever > 8?](https://stackoverflow.com/q/32091992/995714), [System where 1 byte != 8 bit?](https://stackoverflow.com/q/5516044/995714), [What platforms have something other than 8-bit char?](https://stackoverflow.com/q/2098149/995714) – phuclv Jan 29 '20 at 05:22
  • Does this answer your question? [Are there machines, where sizeof(char) != 1, or at least CHAR\_BIT > 8?](https://stackoverflow.com/questions/2215445/are-there-machines-where-sizeofchar-1-or-at-least-char-bit-8) – phuclv Jan 29 '20 at 05:22

7 Answers7

58
  1. Yes, char and byte are pretty much the same. A byte is the smallest addressable amount of memory, and so is a char in C. char always has size 1.

    From the spec, section 3.6 byte:

    byte

    addressable unit of data storage large enough to hold any member of the basic character set of the execution environment

    And section 3.7.1 character:

    character

    single-byte character
    <C> bit representation that fits in a byte

  2. A char has CHAR_BIT bits. It could be any number (well, 8 or greater according to the spec), but is definitely most often 8. There are real machines with 16- and 32-bit char types, though. CHAR_BIT is defined in limits.h.

    From the spec, section 5.2.4.2.1 Sizes of integer types <limits.h>:

    The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

    — number of bits for smallest object that is not a bit-field (byte)
        CHAR_BIT                               8

  3. sizeof(char) == 1. Always.

    From the spec, section 6.5.3.4 The sizeof operator, paragraph 3:

    When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

  4. You can allocate as much memory as your system will let you allocate - there's nothing in the standard that defines how much that might be. You could imagine, for example, a computer with a cloud-storage backed memory allocation system - your allocatable memory might be practically infinite.

    Here's the complete spec section 7.20.3.3 The malloc function:

    Synopsis

    1 #include <stdlib.h>
       void *malloc(size_t size);

    Description

    2 The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate.

    Returns

    3 The malloc function returns either a null pointer or a pointer to the allocated space.

    That's the entirety of the specification, so there's not really any limit you can rely on.

Carl Norum
  • 219,201
  • 40
  • 422
  • 469
  • 3
    Concretely, with memory overcommit on Linux, it's entirely possible to allocate 2TB of memory on a box with 8G mem+swap. – Dave Mar 16 '12 at 21:55
  • *"A char has `CHAR_BIT` bits"* -- where do you get it? The C standard says that `CHAR_BIT` is *"number of bits for smallest object that is not a bit-field (byte)"* -- note: byte, not `char`. Related question: [Is the number of bits in a byte equal to the number of bits in a type char?](http://stackoverflow.com/q/36289134/4279) – jfs Mar 29 '16 at 15:53
  • 1
    @J.F.Sebastian, that's exactly what it says in part 1 of my answer. – Carl Norum Mar 29 '16 at 16:09
  • @CarlNorum: I don't see `CHAR_BIT` being mentioned in part 1 of your answer at all. Are you claiming that from `sizeof(char) == 1` (true) follows that the number of bits in a type `char` is `CHAR_BIT` i.e., there are no padding bits? Regardless of the answer, please, [reopen my question because your answer doesn't answer my question at least **for me** -- the questions are related but I don't see the connection in the answer](http://stackoverflow.com/q/36289134/4279) – jfs Mar 29 '16 at 16:18
  • @J.F.Sebastian - part 1: "byte == char". Part 2 "char has CHAR_BIT bits". – Carl Norum Mar 29 '16 at 16:44
  • 1
    @CarlNorum: `byte == char` is not exact i.e., it is **wrong**. (unrelated: I don't see the word "smallest" in the quotes in your answer). They could be used interchangeably in many cases (I agree with "pretty much the same") but they are not (exactly) the same (as the quotes from C standard in your own answer say explicitly). `char` is a type. byte is a storage unit. char fits in a byte. It happens that [all `CHAR_BIT` bits are used by `char` type (no padding bits)](http://stackoverflow.com/a/36289135/4279) and therefore "A char has `CHAR_BIT` bits." is true but it does not follow from part 1. – jfs Mar 29 '16 at 17:59
  • ? There are plenty of spec quotes in my answer that address all of your concerns. And *yes*, in C, "byte" and "char" are interchangeable words. – Carl Norum Mar 29 '16 at 18:04
17

sizeof(char) is always 1 byte. A byte is not always one octet, however: The Texas Instruments TI C55x, for example, is a DSP with a 16-bit byte.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
Michael Foukarakis
  • 39,737
  • 6
  • 87
  • 123
  • 5
    There are plenty of real machines with non-8-bit bytes. – Carl Norum Mar 15 '12 at 20:34
  • 1
    answer to his question is simple,NO. That's exactly why CHAR_BIT constant exist in posix libraries. – Tomas Pruzina Mar 16 '12 at 04:40
  • @TomasPruzina Mentioning POSIX (and not ISO C) is probably a bit misleading here, because standards as early as POSIX-2001 / SUSv3 required CHAR_BIT = 8 (though POSIX.1-1988, SUSv1, and SUSv2 only repeated the ISO C requirement that CHAR_BIT ≥ 8). AFAIU most systems with non-8-bit char are decidedly non-POSIX. – Alex Shpilkin May 29 '21 at 00:24
11

sizeof(char) is defined to always be 1. From C99:

When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

It is not however guaranteed to be 8 bits. In practice, on the vast majority of platforms out there, it will be, but no, you cannot technically count on that to always be the case (nor should it matter as you should be using sizeof anyway).

Ed S.
  • 122,712
  • 22
  • 185
  • 265
  • Can you explain what that means. A) You say "sizeof(char) is defined to always be 1" One what? B)You say "It is not however guaranteed to be 8 bits" What is "it"? A byte? a Char? C)And you say that you should use sizeof(char) as if to suggest that maybe it won't be 1 so it's safer to always use sizeof, But you say that "sizeof(char) is defined to always be 1". So do you mean in case the platform doesn't conform to C99? – barlop Oct 19 '15 at 20:54
  • I suppose you mean A)1 byte B)a byte, and thus a char, as a char is one byte, is not guaranteed to be 8 bits. But what of "C"? If you use sizeof(char) how is that useful if you know it will always be 1? and that 1 won't tell you how many bits it is anyway. – barlop Oct 19 '15 at 21:14
  • 1
    On any given platform, a "byte", a "char" and the unit for referring to an address or size in memory are the same. sizeof byte is always 1 even on systems where a byte isn't 8 bits, since the result of sizeof is measured *in bytes*. This is useful because on every platform that's how memory is measured and addressed. The number of bits in a byte is defined by the platform so is known at compile time and you can use a #define – thomasrutter Dec 12 '20 at 08:48
6

Concretely, some architectures, especially in the DSP field have char:s larger than 8 bits. In practice, they sacrifice memory space for speed.

Lindydancer
  • 25,428
  • 4
  • 49
  • 68
  • 3
    Given that I work for a company providing such compilers, I find the downvote rather puzzling... Please explain! – Lindydancer Mar 16 '12 at 06:51
4

In C, a char is always one byte, so your first and third assumptions are correct.

A byte is not always 8 bits, though, so your second assumption doesn't always hold. That said, >= 99.99% of all systems in existence today have 8-bit characters, so lots of code implicitly assumes 8-bit characters and runs just fine on all the target platforms. Certainly Windows and Mac machines always use 8-bit characters, and AFAIK Linux does as well (Linux has been ported to so many platforms that I'm not 100% sure that somebody hasn't ported Linux to a platform where 9-bit characters make sense).

The maximum amount of memory that can be allocated is the size of virtual memory, minus space reserved for the operating system.

Adam Mihalcin
  • 14,242
  • 4
  • 36
  • 52
  • Wrong. `sizeof(char)` is always 1, that does not mean that a char is always 8 bits. – Ed S. Mar 15 '12 at 20:18
  • 1st assumption: "a char is represented by a byte", 3rd assumption: " sizeof (char) is always 1." Both are true, and even before the edit I didn't claim otherwise. – Adam Mihalcin Mar 15 '12 at 20:21
  • @nos: That is definitely not what he said. He has since edited the response to be correct, but it was not initially, which is why it had 3 downvotes. – Ed S. Mar 15 '12 at 20:22
  • @EdS. Check the edit history. The first paragraph hasn't changed, so don't claim that "he has since edited the response" to fix some mistake. – Adam Mihalcin Mar 15 '12 at 20:24
  • @AdamMihalcin: No, it's not. You essentially said "Yes, it will always be 8-bits" because you said "Yes" to the OP's question. This is why you got the downvotes. I am not a huge fan of posting incorrect answers quickly, only to later fill in the relevant info, but I have removed my downvote as it is now correct. – Ed S. Mar 15 '12 at 20:24
  • @AdamMihalcin: I can't check the edit history because edits by the OP within a short amount of time are not made visible. It was wrong, and two other people agreed. It is not wrong now though, so I think we can move on. The first paragraph is definitely not the same though. – Ed S. Mar 15 '12 at 20:25
  • I saw a different answer too... it does not show that you edited it within 5 mins so things were added. I don't remember exactly what this answer had, but it is different now. –  Mar 15 '12 at 20:25
  • @EdS. I'd agree with you if I had posted an incorrect answer in the first place, but "your first and third assumptions" (out of 4 assumptions the OP listed) does not mean all assumptions. – Adam Mihalcin Mar 15 '12 at 20:26
  • @EdS. "I think we can move on" Amen to that! – Adam Mihalcin Mar 15 '12 at 20:27
4

Traditionally, a byte is not necessarily 8 bits, but merely a smallish region of memory, usually suitable for storing one character. The C Standard follows this usage, so the bytes used by malloc and sizeof can be more than 8 bits. [footnote] (The Standard does not allow them to be less.)

But sizeof(char) is always 1.

Memorizing the C FAQ is a career-enhancing move.

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
3

The unfortunate thing (or maybe fortunate, depending on how you view things) is that the idea of what a byte is commonly thought as (8 bits) is not synonymous with what the C programming language considers a byte to be. Looking at some of the previous answers, a byte has an exact definition when it comes to the C programming language and nowhere in the definition does it mention a byte being 8 bits. It simply mentions that a byte is

"an addressable unit of data storage large enough to hold any member of the basic character set of the execution environment."

So to answer your question of, “Will a char always-always-always have 8 bits”, the answer is, not always, but most often it will. If you are interested in finding out just exactly how many bits of space your data types consume on your system, you can use the following line of code:

sizeof(type) * CHAR_BIT

Where, type is your data type. For example, to find out how many bits a char takes up on your system, you can use the following:

printf("The number of bits a 'char' has on my system: %zu\n", sizeof(char) * CHAR_BIT);

This is taken from the GNU C Library Reference Manual, which contains the following illuminating explanation on this topic:

There is no operator in the C language that can give you the number of bits in an integer data type. But you can compute it from the macro CHAR_BIT, defined in the header file limits.h. CHAR_BIT — This is the number of bits in a char—eight, on most systems. The value has type int. You can compute the number of bits in any data type type like this:

    `sizeof (type) * CHAR_BIT` 

That expression includes padding bits as well as value and sign bits.

Adam Bak
  • 1,269
  • 12
  • 15