Bytewise reading of memory: "signed char " vs "unsigned char "

Question

One often needs to read from memory one byte at a time, like in this naive memcpy() implementation:

void *memcpy(void *dest, const void *src, size_t n)
{
    char *from = (char *)src;
    char *to   = (char *)dest;

    while(n--) *to++ = *from++;

    return dest;
}

However, I sometimes see people explicitly use unsigned char * instead of just char *.

Of course, char and unsigned char may not be equal. But does it make a difference whether I use char *, signed char *, or unsigned char * when bytewise reading/writing memory?

UPDATE: Actually, I'm fully aware that c=200 may have different values depending on the type of c. What I am asking here is why people sometimes use unsigned char * instead of just char * when reading memory, e.g. in order to store an uint32_t in a char[4].

unsigned char expresses more clearly one are dealing with raw bytes, and not characters, even if it doesn't matter as the binary values are the same. — nos, Dec 05 '11 at 13:34

u0b34a0f6ae · Accepted Answer · 2011-12-05T14:09:35.653

You should use unsigned char. The C99 standard says that unsigned char is the only type guaranteed to be dense (no padding bits), and also defines that you may copy any object (except bitfields) exactly by copying it into an unsigned char array, which is the object representation in bytes.

The sensible interepretation of this is to me, that if you use a pointer to access an object as bytes, you should use unsigned char.

Reference: http://blackshell.com/~msmud/cstd.html#6.2.6.1 (From ~~C1x draft~~ C99)

score 14 · Answer 2 · edited Jul 15 '15 at 04:14

14

This is one point where C++ differs from C. Generally speaking, C only guarantees that raw memory access works for unsigned char; char may be signed, and on a 1's complement or signed magnitude machine, a -0 might be converted to +0 automatically, changing the bit pattern. For some reason (unknown to me), the C++ committee extends the guarantees supporting transparent copy (no change in bit patterns) to char, as well as unsigned char; on a 1's complement or signed magnitude machine, the implementors have no choice but to make plain char unsigned, in order to avoid such side effects. (And of course, most programmers today aren't concerned by such machines anyway.)

Anyway, the end result is that older programmers, who come from a C background (and maybe have actually worked on a 1's complement or a signed magnitude machine) will automatically use unsigned char. It's also a frequent convention to reserve plain char for character data uniquely, with signed char for very small integral values, and unsigned char for raw memory, or when bit manipulation is intended. Such a rule allows the reader to distinguish between different uses (provided it is followed religiously).

edited Jul 15 '15 at 04:14

M.M

138,810
21
208
365

answered Dec 05 '11 at 14:15

James Kanze

150,581
18
184
329

7

I think every time you say "2's complement" above, you mean "1s' complement". But it's also implementation-defined in C, for 2's complement signed types, whether the value consisting of sign bit 1 and all other bits 0, is a trap value or not (if not then of course it's the minimum value of the type). So there may even be some 2's complement hardware out there somewhere on which copying by `char` would fail if `char` were signed. – Steve Jessop Dec 05 '11 at 14:37
I will double Steve Jessop. We all work with 2's complement machines now – Ulterior Dec 05 '11 at 15:13
@SteveJessop Yes. It started as a typo, which got duplicated. (But I've never seen a 2's complement machine which trapped on the maximum negative value. Although that would have made life a lot easier: the fact that `-INT_MIN` is not a legal value for `int` means you have to pay a lot of attention in conversion routines.) – James Kanze Dec 05 '11 at 15:49
I wonder whether the motivation for C permitting that trap value is that someone presented an example, or that someone threatened to build one, or just wishful thinking. – Steve Jessop Dec 06 '11 at 11:02
@SteveJessop Or simply uncertainty. The existence of 1's complement machines was widely known at the time. How they actually behaved with regards to -0 was less known, and probably, no one could be sure that such a case didn't exist, and no one felt that there was any advantage in banning it. – James Kanze Dec 06 '11 at 11:23

score 2 · Answer 3 · answered Dec 05 '11 at 13:22

2

In your code example it makes no difference. But if you want to display/print the value of the byte than it does (as the highest bit is interpreted differently), and unsigned char seems more suitable

answered Dec 05 '11 at 13:22

Walter

44,150
20
113
196

score 0 · Answer 4 · answered Sep 01 '22 at 14:48

If you want to read/write memory bytewise, look into using std::byte instead of unsigned char:

https://en.cppreference.com/w/cpp/types/byte

This type permits bitwise logical operations, and can can help to avoid difficult-to-debug programming errors.

score 0 · Answer 5 · answered Dec 05 '11 at 13:15

0

It depends on what you want to store in the char. A signed char gives you a range from -127 to 127 whereas an unsigned char ranges from 0 to 255.

For pointer arithmetic it doesn't matter.

answered Dec 05 '11 at 13:15

Chris

1,613
1
18
27

score 0 · Answer 6 · answered Dec 05 '11 at 13:25

0

#include<stdio.h>
#include<string.h>

int main()
{

unsigned char a[4]={254,254,254,'\0'};
unsigned char b[4];
char c[4];

memset(b,0,4);
memset(c,0,4);

memcpy(b,a,4);
memcpy(c,a,4);
int i;
for(i=0;i<4;i++)
{
    printf("\noriginal is %d",a[i]);
    printf("\nchar %d is %d",i,c[i]);
    printf("\nunsigned char %d is %d \n\n",i,b[i]);
}

}

output is

original is 254
char 0 is -2           
unsigned char 0 is 254 


original is 254
char 1 is -2
unsigned char 1 is 254 


original is 254
char 2 is -2
unsigned char 2 is 254 


original is 0
char 3 is 0
unsigned char 3 is 0

so here char and unsign both have the same value so it doesnt matter in this case

Edit

if you read anything as signed char still in that case most highre bit will also going to copy so it doesnt matter

answered Dec 05 '11 at 13:25

Jeegar Patel

26,264
51
149
222

2

I won't downvote a fellow answer, but `proof by one working example` isn't a way to go with C, then we are just talking about one implementation, and even worse, maybe just that version of it. – u0b34a0f6ae Dec 05 '11 at 13:59
@kaizer.se By the way, it would be more questionable if you didn't downvote a bad answer just because it's a fellow answer. Neither do I want to encourage you to downvote this one, just a general comment. – Christian Rau Dec 05 '11 at 14:33
@Christian: I don't understand your viewpoint. In general, I think there are too few downvotes on stack, few that do it, and nothing strange will happen if I abstain once. – u0b34a0f6ae Dec 05 '11 at 14:44
2

@kaizer.se I didn't say you should do it, it is of course your decision, just don't refrain from it for the wrong reasons (like political correctness or such things), which would make the voting system absurd. – Christian Rau Dec 05 '11 at 14:57

Bytewise reading of memory: "signed char " vs "unsigned char "

6 Answers6

Linked

Bytewise reading of memory: "signed char *" vs "unsigned char *"

6 Answers6

Linked

Bytewise reading of memory: "signed char " vs "unsigned char "