char size confusion

Question

As per I know that 1 char = 1 byte = 8 bits(32 bit system).

char c=0xffff0000;  //wrong

then why char allow just 8 bits and also every character in a file also of 8 bit length.

thanks.

@Dave18: a byte is not always 8 bits - it just happens to be the most common value these days, — Paul R, Nov 24 '10 at 22:11
Since the tag is `c++`, so please refer this link: [Would you please go over the rules about bytes, chars, and characters one more time?](https://isocpp.org/wiki/faq/intrinsic-types#bytes-review). — Nan Xiao, Dec 31 '15 at 02:33

Chubsdad · Accepted Answer · 2010-11-24T12:40:02.293

23

No. The sizeof char is by definition 1. But this does not mean that it occupies 32-bits/8-bits always.

$3.9.1/1- "Objects declared as characters (char) shall be large enough to store any member of the implementation’s basic character set."

There appears to be a confusion that a byte is 8-bits. The C++ Standard does not mandate this however.

Here's how byte is defined in the Standard $1.7/1

The fundamental storage unit in the C + + memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set and is composed of a contiguous sequence of bits, the number of which is implementation-defined.

As is clear, a byte need not be always 8-bits.

edited Nov 24 '10 at 12:40

answered Nov 24 '10 at 12:34

Chubsdad

24,777
4
73
129

3

Do you know of any system where a byte is not 8 bits? Curious. – Steve Townsend Nov 24 '10 at 12:41
@Steve Townsend: Nope. My programming world starts in late 90s :) – Chubsdad Nov 24 '10 at 12:46
2

I've heard that some DSPs have 1 byte of 12 bits (hope I/m not wrong). – botismarius Nov 24 '10 at 12:49
@Steve: "The use of 8-bit codes for digital telephony also caused 8-bit data 'octets' to be adopted as the basic data unit of the early Internet." and more at http://en.wikipedia.org/wiki/Byte#Size – Fred Nurk Nov 24 '10 at 12:55
@botismarius: Yes, it appears so with TMS320C44 and others. Unfortunately, I haven't got a chance to work with DSPs :( – Chubsdad Nov 24 '10 at 12:56

unwind · Answer 2 · 2014-06-28T13:16:29.270

Just because a system is classified as "32 bit" doesn't mean it uses 32-bit bytes.

A byte is often defined (in a system-dependent way) as the smallest addressable piece of memory, and for many architectures that is still 8 bits, even though the architectures (like x86 or x86-64) are capable of working with larger amounts of data in registers (32 vs 64, respectively). If you're into this thinking, you often use the word "octet" to talk about 8-bit quantities, since the meaning of "byte" changes with the architecture being discussed.

In contrast, for some people "a byte" is defined as always being 8 bits, but then the confusion in the question would probably never happen since they wouldn't expect char on e.g. a 32-bit system to be 32 bits.

Of course, the entire idea of classifying a system as "n-bit" is oversimplifying things quite a lot.

In C, you can always #include <limits.h> and then use the CHAR_BIT macro to get the number of bits in the compiler target's char data type.

T33C · Answer 3 · 2010-11-24T13:28:05.167

3

char has CHAR_BIT bits [from #include <climits>]

On 80x86 machines I have always seen this as 8-bits.
On a TMS320C54x and TMS320C55x DSP's I have seen it as 16-bit. This was a pain because to save memory, strings had to be packed with two ASCII characters held in each char!

Always, sizeof(char) == 1

edited Nov 24 '10 at 13:28

answered Nov 24 '10 at 12:45

T33C

4,341
2
20
42

score 2 · Answer 4 · answered Nov 24 '10 at 12:40

a char is always a byte and always has size 1.

A byte always has at least 8 bits but can have more on some systems.

A 32-bit system refers to the size of the address-bus, in C or C++ you can think of this as the size of a pointer, not the size of a byte.

score 1 · Answer 5 · answered Nov 24 '10 at 12:38

1

The number of bits in a char generally 8 (one byte/octet). The exact number is defined in the header <climits> as CHAR_BIT.

answered Nov 24 '10 at 12:38

Fred Foo

355,277
75
744
836

score 0 · Answer 6 · answered Nov 24 '10 at 12:33

0

1 byte = 8 bits

answered Nov 24 '10 at 12:33

small_ticket

1,910
5
22
30

1

Not necessarily, though almost universal now. The C++ standard caters for computers with 6-bit bytes (as per the quote posted by @Chubsdad). – Fred Foo Nov 24 '10 at 12:40
3

@larsmans: Actually, it doesn't. CHAR_BIT must be at least 8. – Fred Nurk Nov 24 '10 at 12:59
@Fred Nurk: you're right, and I'd miscounted the required character set size in C++. 6,5 bits is the minimum `char` size that will store them :) – Fred Foo Nov 24 '10 at 13:24

score 0 · Answer 7 · answered Nov 24 '10 at 12:37

0

One byte is most certainly NOT 32 bits. A byte is always 8 bits, no matter what system you're on.

A system that is "32-bit" means that the "word" size is 32 bits. In other words, data is transferred around the system in 32-bit chunks.

answered Nov 24 '10 at 12:37

Kricket

4,049
8
33
46

5

no. a byte as in C++ Standard is not 8-bit always. Refer my answer – Chubsdad Nov 24 '10 at 12:37
2

@Chubsdad is right. ISO tends to use the word `octet` for an 8-bit value (at least for the comms-type standards), `byte` and `char` are the same (implementation-defined) size in C++ (and C). – paxdiablo Nov 24 '10 at 12:58

Steve Townsend · Answer 8 · 2010-11-24T15:12:24.517

0

In addition to points made already - note that sizeof(char) and the size of a character are not always the same.

Multibyte character sets can take > 1 byte per character - for example, a Unicode character always takes up more than one byte (sizeof(wchar_t)).

Microsoft docs on this topic are here. To add to the confusion, some character sets don't even use a fixed number of bytes for each character.

edited Nov 24 '10 at 15:12

answered Nov 24 '10 at 12:39

Steve Townsend

53,498
9
91
140

"a Unicode character always takes up two bytes (sizeof(wchar_t)" would suggest that `sizeof(wchar_t)` is always 2. A common value is in fact 4, which makes sense as there are ~100.000 Unicode characters. – MSalters Nov 24 '10 at 15:10
1

At least it's now in line with normal compilers, but from a standards perspective there's still a few comments to be made. "Multibyte characters" are well-defined in C and C++, but they're not `wchar_t`. Instead, a multi-byte character (MBC) is a sequence of more than one `char`==byte. Shift-JIS or UTF-8 use such characters. And since `wchar_t` is unrelated to multi-byte characters, you can't deduce that `sizeof(wchar_t)` is always >1. – MSalters Nov 25 '10 at 10:38

char size confusion

8 Answers8

Linked