1

How can one portably perform pointer arithmetic with single byte precision?

Keep in mind that:

  • char is not 1 byte on all platforms
  • sizeof(void) == 1 is only available as an extension in GCC
  • While some platforms may have pointer deref pointer alignment restrictions, arithmetic may still require a finer granularity than the size of the smallest fundamental POD type
Matt Joiner
  • 112,946
  • 110
  • 377
  • 526
  • 1
    I'm curious - which obscure platform is has a char not being a byte? Sounds like the whole premise of the question is premature portability ;) – Will Dec 08 '09 at 06:25
  • 1
    Many DSP chips have > 8-bit `char` types. But by definition, `sizeof(char) == 1` everywhere. I think I have heard of 9-bit `char` systems too. – Alok Singhal Dec 08 '09 at 06:29
  • There is at least one embedded platform on which `sizeof(char)` is the same as `sizeof(int)` (both are 1), meaning that both types are 32-bit integers of identical size. I'm not sure about the size of *machine* byte on that platform though. – AnT stands with Russia Dec 08 '09 at 06:43
  • i've heard of 11 bit, and 32 bit characters. – Matt Joiner Dec 08 '09 at 06:50
  • Anacrolix, irrelevant. Bits in a char is given by CHAR_BIT. Doesn't change sizeof(char). – Alex Budovski Dec 08 '09 at 07:48
  • I think Cray used to have 64-bit chars in their early C-compilers since that was the smallest addressable datum. They were not the best machines for compiling on :) – Per Knytt Dec 09 '09 at 09:24

7 Answers7

21

Your assumption is flawed - sizeof(char) is defined to be 1 everywhere.

From the C99 standard (TC3), in section 6.5.3.4 ("The sizeof operator"):

(paragraph 2)

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type.

(paragraph 3)

When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

When these are taken together, it becomes clear that in C, whatever size a char is, that size is a "byte" (even if that's more than 8 bits, on some given platform).

A char is therefore the smallest addressable type. If you need to address in units smaller than a char, your only choice is to read a char at a time and use bitwise operators to mask out the parts of the char that you want.

caf
  • 233,326
  • 40
  • 323
  • 462
  • The OP didn't say that 'sizeof(char)' can be different from 1. I believe, the OP specifically avoided that wording to make people understand that "not 1 byte on all platforms" means "not 1 *machine* byte". This is prefectly possible, even though at the language level `sizeof(char)` would still always be 1. – AnT stands with Russia Dec 08 '09 at 06:41
  • @caf i'll accept this as the answer if you can provide links to those sections of the C99 standard, thanks – Matt Joiner Dec 08 '09 at 07:14
  • 1
    AndreyT: Look at the question title. – caf Dec 08 '09 at 07:39
  • @Anacrolix, he provided a "link": Section 6.5.3.4. So you're trying to write portable programs in C without having the current ANSI C Standard available, not even a free draft? – Secure Dec 08 '09 at 08:24
  • no the link is for others, and for reference, it will make it a better answer. – Matt Joiner Dec 09 '09 at 03:07
  • Anacrolix, I'd be inclined to accept caf's answer with its excerpts from the standard - the C99 standard trumps all other documents on this sort of question. – Tim Robinson Dec 09 '09 at 12:58
  • Actually a byte is @AndreyT, a `char` is specified to be 1-byte on all platforms. The size of this byte, in bits, is implementation defined. However, it will always be at least 8-bits. – Joe D Jul 24 '10 at 11:34
  • @Secure: Who said anything about portability? – Matt Joiner Aug 18 '10 at 04:32
  • @Matt Joiner: Erm... You in your own original question? You've even tagged it with "portability". – Secure Aug 18 '10 at 05:36
6

sizeof(char) always returns 1, in both C and C++. A char is always one byte long.

Tim Robinson
  • 53,480
  • 10
  • 121
  • 138
3

sizeof(char) is guaranteed to be 1 by the C standard. Even if char uses 9 bits or more.

So you can do:

type *pt;
unsigned char *pc = (unsigned char *)pt;

And use pc for arithmetic. Assigning pc to pt by using the cast above is undefined behavior by the C standard though.

If char is more than 8-bits wide, you can't do byte-precision pointer arithmetic in portable (ANSI/ISO) C. Here, by byte, I mean 8 bits. This is because the fundamental type itself is bigger than 8 bits.

Alok Singhal
  • 93,253
  • 21
  • 125
  • 158
  • It's actually perfectly permissible under the standard to examine any object as if it were an array of `char`, `unsigned char` or `signed char`. There's several guarantees made in order to ensure this is allowed - like the fact that `char` may not have padding bits. – caf Dec 08 '09 at 06:31
  • 1
    @caf: Only `unsigned char` type is guaranteed to have no padding bits. `signed char` type can have padding bits (and trap representations). Langauge specification allows reinterpreting objects as arrays of `signed char`, but it is your responsibility to ensure somehow that you won't hit a trap representation for `signed char`. If you really want to be sure of safe reinterpretation, always use an array of `unsigned char`. – AnT stands with Russia Dec 08 '09 at 06:35
  • You're right of course. I should have been more careful before claiming undefined behavior. Thanks for correcting me. – Alok Singhal Dec 08 '09 at 06:37
  • @AndreyT: I think caf was talking about *pointers*: My copy of the standard says (section 3.2.2.2): *A pointer to a non-qualified type may be converted to a pointer to the qualified version of the type; the values stored in the original and converted pointers shall compare equal.* – Alok Singhal Dec 08 '09 at 06:39
  • @Alok: No, it is prefectly clear that caf is talking about reinterpreting any object as an array of [signed/unsigned] char objects. – AnT stands with Russia Dec 08 '09 at 06:45
  • @AndreyT: I think I need some sleep. If you're actually looking at the values (which is the point I think), then you should use unsigned char. Thanks for correcting me. – Alok Singhal Dec 08 '09 at 06:50
  • Yes, you're right that `signed char` and `char` can possibly have padding bits - I was wrong about that. I'm *not* so sure about the trap representations, though - the relevant text (in 6.2.6.1 p5) says that a trap representation accessed through an lvalue "that does not have character type" causes undefined behaviour, implying that *if* `char` and `signed char` can have trap representations, then accessing them is not undefined behaviour, which seems a little odd. – caf Dec 08 '09 at 07:47
  • Good grief, signed char can have padding bits? So for instance `unsigned char` might be a 9 bit unsigned integer 0 - 512, and `char` an 8 bit signed integer -128 - 127. C++ forbids this: yet another arbitrary difference between the two, but I can see why... – Steve Jessop Dec 08 '09 at 11:17
  • I could be wrong about `signed char` having trap representations. I see that it can have padding bits, since the standard is quite specific about only `unsigned char` not having padding bits. As for trap representations - I'm not sure. Note, BTW, that even while C++ says that `signed char` has no padding bits, at the same time it doesn't guarantee that all combinations of bits "represent numbers". Isn't this supposed to mean that `signed char` can have trap representations even in C++? – AnT stands with Russia Dec 08 '09 at 14:51
3

According to the standard char is the smallest addressable chunk of data. You just can't address with greater precision - you would need to do packing/unpacking manually.

sharptooth
  • 167,383
  • 100
  • 513
  • 979
1

Cast the pointer to a uintptr_t. This will be an unsigned integer that is the size of a pointer. Now do your arithmetic on it, then cast the result back to a pointer of the type you want to dereference.

(Note that intptr_t is signed, which is usually NOT what you want! It's safer to stick to uintptr_t unless you have a good reason not to!)

Vincent Gable
  • 3,455
  • 23
  • 26
1

I don't understand what you are trying to say with sizeof(void) being 1 in GCC. While type char might theoretically consist of more than 1 underlying machine byte, in C language sizeof(char) is 1 and always exactly 1. In other words, from the point of view of C language, char is always 1 "byte" (C-byte, not machine byte). Once you understand that, you'd also understand that sizeof(void) being 1 in GCC does not help you in any way. In GCC the pointer arithmetic on void * pointers works in exactly the same way as pointer arithmetic on char * pointers, which means that if on some platform char * doesn't work for you, then void * won't work for you either.

If on some platform char objects consist of multiple machine bytes, the only way to access smaller units of memory than a full char object would be to use bitwise operations to "extract" and "modify" the required portions of a complete char object. C language offers no way to directly address anything smaller than char. Once again char is always a C-byte.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
0

The C99 standard defines the uint8_t that is one byte long. If the compiler doesn't support this type, you could define it using a typedef. Of course you would need a different definition, depending on the the platform and/or compiler. Bundle everything in a header file and use it everywhere.

kgiannakakis
  • 103,016
  • 27
  • 158
  • 194
  • "If the compiler doesn't support this type, you could define it using a typedef". Actually you can't. If the compiler has a type that provides the behaviour of uint8_t, then it must define uint8_t in stdint.h. So if it doesn't define it, it follows that there's nothing you could typedef it to yourself that would have the correct semantics. You might be able to get close, though, for example if the implementation had an 8 bit type with padding bits. Assuming a C99 compiler, that is. – Steve Jessop Dec 08 '09 at 11:19
  • What about non C99 compilers? Usually 'typedef unsigned char uint8_t;' will give you a byte wide type. Is there something more to uint8_t semantics than being an 8-bit data type? – kgiannakakis Dec 08 '09 at 12:06
  • C89 compilers won't necessarily have stdint.h at all, so you can't assume that if they can implement uint8_t, then they will. Actually, I think I was wrong, there aren't any additional requirements for uint8_t, so you won't ever be "close but not quite there". It's int8_t that has the extra requirements: must be 2's complement and have no padding bits. So bad example there on my part, I'll try again: if for instance your compiler has a 16 bit char, then there may not be any 8 bit types at all, and hence nothing you can use as uint8_t. Code that relies on it is not completely portable. – Steve Jessop Dec 08 '09 at 14:23