35

Is there a standards-complaint method to represent a byte in ANSI (C89/90) C? I know that, most often, a char happens to be a byte, but my understanding is that this is not guaranteed to be the case. Also, there is stdint.h in the C99 standard, but what was used before C99?

I'm curious about both 8 bits specifically, and a "byte" (sizeof(x) == 1).

James McNellis
  • 348,265
  • 75
  • 913
  • 977
Sydius
  • 13,567
  • 17
  • 59
  • 76
  • 8
    Make sure you distinguish byte from octet. sizeof(char) = 1 always, which means a char is always a byte. However, a byte is not always an octet (DEC Alpha bytes were 10 bits, IIRC... octects are defined to be 8 bits). – Tom Jan 13 '09 at 04:47

6 Answers6

69

char is always a byte , but it's not always an octet. A byte is the smallest addressable unit of memory (in most definitions), an octet is 8-bit unit of memory.

That is, sizeof(char) is always 1 for all implementations, but CHAR_BIT macro in limits.h defines the size of a byte for a platform and it is not always 8 bit. There are platforms with 16-bit and 32-bit bytes, hence char will take up more bits, but it is still a byte. Since required range for char is at least -127 to 127 (or 0 to 255), it will be at least 8 bit on all platforms.

ISO/IEC 9899:TC3

6.5.3.4 The sizeof operator

  1. ...
  2. The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. [...]
  3. When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1. [...]

Emphasis mine.

Community
  • 1
  • 1
Alex B
  • 82,554
  • 44
  • 203
  • 280
  • Just for clarifications, is sizeof(char) always 1 per the spec, or just happens to be in all implementations? – Sydius Jan 13 '09 at 02:28
  • Assuming you are using an odd architecture with a <8-bit byte, couldn't char not be a byte (since CHAR_BITS >= 8)? If not, could you precisely define what you mean by "byte" above? – Chris Conway Jan 13 '09 at 16:34
  • 4
    The required range for char is actually either -127 to 127 (don't forget that some architectures used to use signed magnitude or one's complement integer representations) or 0 to 255, depending on whether char is signed or unsigned. 8-bit two's complement supports -128 to 127, not -127 to 128. – bk1e Jan 14 '09 at 07:48
  • 3
    @Chris: byte = smallest addressable unit of memory. I am not sure what you mean by your question. less-than-8bit byte means a platform can't be C compliant. – Alex B Jan 14 '09 at 08:37
  • 1
    Didn't realize C required >=8-bit bytes (indeed, the standard says a byte must hold a char and a char must be 8 bits). We've reached the frontier of C's portability... – Chris Conway Jan 14 '09 at 16:36
  • What platforms have bigger than 8bit bytes? – theduke Oct 02 '11 at 00:22
  • 2
    @theduke, mainly DSPs, for example: http://leo.sprossenwanne.at/dsp/Entwicklungsprogramme/Entpack/CC56/DSP/INCLUDE/LIMITS.H – Alex B Oct 02 '11 at 03:10
  • 14
    A physical hardware byte smaller than 8 bits is no problem with regard to C conformance as long as the **logical byte** presented by the C implementation is at least 8 bits. This means a machine with 7-bit hardware bytes could provide a 14-bit logical byte for `char` and be conformant, but then all larger types would have to occupy an integral (and aligned) number of such logical bytes (i.e. you could not have a 21-bit integer made up of 3 hardware bytes unless you included an additional 7 bits of padding (the rest of the second `char`) along with it. – R.. GitHub STOP HELPING ICE Oct 21 '11 at 04:54
11

You can always represent a byte (if you mean 8bits) in a unsigned char. It's always at least 8 bits in size, all bits making up the value, so a 8 bit value will always fit into it.

If you want exactly 8 bits, i also think you'll have to use platform dependent ways. POSIX systems seem to be required to support int8_t. That means that on POSIX systems, char (and thus a byte) is always 8 bits.

Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • POSIX support for stdint.h post-dates C99. – Chris Conway Jan 13 '09 at 00:47
  • ah yeah. looks like from 2001. but i think even if he hasn't got a c99 compiler shipping it - if he's on a posix machine, he can take advantage of its requirements from stdint.h . if he's on ms windows, all my bets are off :) maybe he can grab stuff out of cstdint.hpp of boost and c'ify them ? – Johannes Schaub - litb Jan 13 '09 at 00:55
  • I mean a byte, not necessarily 8 bits, but thanks. As an aside, does the spec say it must be at least 8 bits, or does it just happen to be the case? – Sydius Jan 13 '09 at 02:30
  • 2
    yes, the c standard documenting limits.h requires UCHAR_MAX be at least 255, have no padding bits and use a pure binary system. char is required to have same range and representation as either unsigned char or signed char but still must be a distinct type. – Johannes Schaub - litb Jan 13 '09 at 12:15
3

In ANSI C89/ISO C90 sizeof(char) == 1. However, it is not always the case that 1 byte is 8 bits. If you wish to count the number of bits in 1 byte (and you don't have access to limits.h), I suggest the following:

unsigned int bitnum(void) {
    unsigned char c = ~0u; /* Thank you Jonathan. */
    unsigned int v;

    for(v = 0u; c; ++v)
        c &= c - 1u;
    return(v);
}

Here we use Kernighan's method to count the number of bits set in c. To better understand the code above (or see others like it), I refer you to "Bit Twiddling Hacks".

  • 3
    Better to use ~0 than -1; on a one's complement or sign-magnitude machine, -1 might not be all-bits-set. ~0 is guaranteed to be all bits set. – Jonathan Leffler Jan 13 '09 at 04:31
  • @Jonathan: That makes sense. Thank you for the suggestion. I am editing the post now. (I'm sorry that I edited this comment so many times!) –  Jan 13 '09 at 04:52
  • -1 is always all bits one. the conversion of -1 to unsigned char is not necassarily bit-preserving (truncating) – Johannes Schaub - litb Jan 13 '09 at 12:17
  • 2
    it's defined mathematically: -N is (2^CHAR_BIT - (N mod (2^CHAR_BIT))) that means, -1 is always the most highest unsigned char, having all bits 1. the difference in sign representation is, that if you have two's complement, the conversion is conceptual there: the bit pattern won't change: – Johannes Schaub - litb Jan 13 '09 at 12:21
  • while -1 is all bits 1 before, it's so too after conversion to unsigned char. nitpicking (i really don't like this, but just to be correct :)), ~0u could (after conversion) instead result in a different value than all-bits-1: converting a value to unsigned char will wrap around N => N mod 2^CHAR_BIT – Johannes Schaub - litb Jan 13 '09 at 12:36
  • ... means that if N is not a multiple of UCHAR_MAX (which can happen, because an unsigned int does not need to use all its bits to store its value), you can be left with a value not necassary all bits 1.so i think your first version converting -1 to unsigned char was alright.plz tell me if i'm wrong – Johannes Schaub - litb Jan 13 '09 at 12:36
  • to quote it directly: "Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type." – Johannes Schaub - litb Jan 13 '09 at 12:41
  • (by saying "-1 is always all bits one" i mean -1 converted to unsigned char, like you had it in your answer. -1 by itself, of course, is only all bits one for two's complement). for two's complement, the conversion doesn't change the bits. comments are too short to really tell the truth :) – Johannes Schaub - litb Jan 13 '09 at 17:39
  • I'm fairly certain that (unsigned char)-1 will not set all bits on a machine which uses either a ones' complement or a sign-magnitude representation of signed numbers. –  Jan 13 '09 at 22:10
  • @anon: You may be fairly certain, but you're also wrong. `(unsigned_type)-1` is **always** all-ones-bits in the destination type. – R.. GitHub STOP HELPING ICE Oct 21 '11 at 04:56
  • 1
    @R: How can this be? One's complement means that, for 16 bit integers, -1 is %11111111-11111110 because to make a negative number, bits are simply flipped ([see here](http://en.wikipedia.org/wiki/One%27s_complement)). Only for Two's complement -1 would be %11111111-11111111, i.e. 0x7FFFF + 1 (which is when many CPUs kindly set the overflow flag). – Andreas Spindler Oct 23 '12 at 21:23
  • 1
    @AndreasSpindler: see JohannesSchaub-litb's comment: a conversion from signed to unsigned is not just a bit-pattern reinterpretation, conceptually, you add Uxxx_MAX until you are in range. – ninjalj Feb 23 '15 at 18:47
1

Before C99? Platform-dependent code.

But why do you care? Just use stdint.h.

In every implementation of C I have used (from old UNIX to embedded compilers written by hardware engineers to big-vendor compilers) char has always been 8-bit.

Frank Krueger
  • 69,552
  • 46
  • 163
  • 208
-3

You can find pretty reliable macros and typedefs in boost.

PolyThinker
  • 5,152
  • 21
  • 22
  • 1
    Well, you could just copy/paste what you need from there. There's nothing special if you only need a reliable type of integers of a certain length. – PolyThinker Jan 13 '09 at 01:04
-5

I notice that some answered have re-defined the word byte to mean something other than 8 bits. A byte is 8 bits, however in some c implementations char is 16 bits (2 bytes) or 8 bits (1 byte). The people that are calling a byte 'smallest addressable unit of memory' or some such garbage have lost grasp of the meaning of byte (8 bits). The reason that some implementations of C have 16 bit chars (2 bytes) and some have 8 bit chars (1 byte), and there is no standard type called 'byte', is due to laziness.

So, we should use int_8

ty5
  • 5
  • 7
    The **language standard** has defined **its** meaning of the word 'byte' to mean the smallest addressable unit. That doesn't have to be 8 bits. It can be larger on some systems. Those systems are also unlikely to even have an int_8 (or int8_t). – Bo Persson Jun 12 '11 at 17:55
  • Not just unlikely. `int8_t` is required, if it exists, to have no padding bits (and twos complement representation), so the only way it can exist is if `char` is exactly 8 bits. – R.. GitHub STOP HELPING ICE Oct 21 '11 at 04:58
  • 1
    Byte has traditionally _not_ meant 8 bits. E.g: the main reason FTP uses separate control and data connections is to be able to select an appropriate byte size for the data connection, e.g: for 36-bit computers. Note that the RFCs use the term **octet** (and avoid using the ambiguous term byte) to mean an 8-bit data unit. – ninjalj Feb 23 '15 at 19:10