4

As per I know that 1 char = 1 byte = 8 bits(32 bit system).

char c=0xffff0000;  //wrong

then why char allow just 8 bits and also every character in a file also of 8 bit length.

thanks.

Choxx
  • 945
  • 1
  • 24
  • 46
YAHOOOOO
  • 939
  • 4
  • 19
  • 27
  • 3
    good title to this question, though. – Alex Brown Nov 24 '10 at 12:36
  • 3
    @Dave18: a byte is not always 8 bits - it just happens to be the most common value these days, – Paul R Nov 24 '10 at 22:11
  • Since the tag is `c++`, so please refer this link: [Would you please go over the rules about bytes, chars, and characters one more time?](https://isocpp.org/wiki/faq/intrinsic-types#bytes-review). – Nan Xiao Dec 31 '15 at 02:33

8 Answers8

23

No. The sizeof char is by definition 1. But this does not mean that it occupies 32-bits/8-bits always.

$3.9.1/1- "Objects declared as characters (char) shall be large enough to store any member of the implementation’s basic character set."

There appears to be a confusion that a byte is 8-bits. The C++ Standard does not mandate this however.

Here's how byte is defined in the Standard $1.7/1

The fundamental storage unit in the C + + memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set and is composed of a contiguous sequence of bits, the number of which is implementation-defined.

As is clear, a byte need not be always 8-bits.

Chubsdad
  • 24,777
  • 4
  • 73
  • 129
4

Just because a system is classified as "32 bit" doesn't mean it uses 32-bit bytes.

A byte is often defined (in a system-dependent way) as the smallest addressable piece of memory, and for many architectures that is still 8 bits, even though the architectures (like x86 or x86-64) are capable of working with larger amounts of data in registers (32 vs 64, respectively). If you're into this thinking, you often use the word "octet" to talk about 8-bit quantities, since the meaning of "byte" changes with the architecture being discussed.

In contrast, for some people "a byte" is defined as always being 8 bits, but then the confusion in the question would probably never happen since they wouldn't expect char on e.g. a 32-bit system to be 32 bits.

Of course, the entire idea of classifying a system as "n-bit" is oversimplifying things quite a lot.

In C, you can always #include <limits.h> and then use the CHAR_BIT macro to get the number of bits in the compiler target's char data type.

unwind
  • 391,730
  • 64
  • 469
  • 606
3

char has CHAR_BIT bits [from #include <climits>]

On 80x86 machines I have always seen this as 8-bits.
On a TMS320C54x and TMS320C55x DSP's I have seen it as 16-bit. This was a pain because to save memory, strings had to be packed with two ASCII characters held in each char!

Always, sizeof(char) == 1

T33C
  • 4,341
  • 2
  • 20
  • 42
2

a char is always a byte and always has size 1.

A byte always has at least 8 bits but can have more on some systems.

A 32-bit system refers to the size of the address-bus, in C or C++ you can think of this as the size of a pointer, not the size of a byte.

CashCow
  • 30,981
  • 5
  • 61
  • 92
1

The number of bits in a char generally 8 (one byte/octet). The exact number is defined in the header <climits> as CHAR_BIT.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
0

1 byte = 8 bits

small_ticket
  • 1,910
  • 5
  • 22
  • 30
  • 1
    Not necessarily, though almost universal now. The C++ standard caters for computers with 6-bit bytes (as per the quote posted by @Chubsdad). – Fred Foo Nov 24 '10 at 12:40
  • 3
    @larsmans: Actually, it doesn't. CHAR_BIT must be at least 8. – Fred Nurk Nov 24 '10 at 12:59
  • @Fred Nurk: you're right, and I'd miscounted the required character set size in C++. 6,5 bits is the minimum `char` size that will store them :) – Fred Foo Nov 24 '10 at 13:24
0

One byte is most certainly NOT 32 bits. A byte is always 8 bits, no matter what system you're on.

A system that is "32-bit" means that the "word" size is 32 bits. In other words, data is transferred around the system in 32-bit chunks.

Kricket
  • 4,049
  • 8
  • 33
  • 46
  • 5
    no. a byte as in C++ Standard is not 8-bit always. Refer my answer – Chubsdad Nov 24 '10 at 12:37
  • 2
    @Chubsdad is right. ISO tends to use the word `octet` for an 8-bit value (at least for the comms-type standards), `byte` and `char` are the same (implementation-defined) size in C++ (and C). – paxdiablo Nov 24 '10 at 12:58
0

In addition to points made already - note that sizeof(char) and the size of a character are not always the same.

Multibyte character sets can take > 1 byte per character - for example, a Unicode character always takes up more than one byte (sizeof(wchar_t)).

Microsoft docs on this topic are here. To add to the confusion, some character sets don't even use a fixed number of bytes for each character.

Steve Townsend
  • 53,498
  • 9
  • 91
  • 140
  • "a Unicode character always takes up two bytes (sizeof(wchar_t)" would suggest that `sizeof(wchar_t)` is always 2. A common value is in fact 4, which makes sense as there are ~100.000 Unicode characters. – MSalters Nov 24 '10 at 15:10
  • 1
    At least it's now in line with normal compilers, but from a standards perspective there's still a few comments to be made. "Multibyte characters" are well-defined in C and C++, but they're not `wchar_t`. Instead, a multi-byte character (MBC) is a sequence of more than one `char`==byte. Shift-JIS or UTF-8 use such characters. And since `wchar_t` is unrelated to multi-byte characters, you can't deduce that `sizeof(wchar_t)` is always >1. – MSalters Nov 25 '10 at 10:38