What is an unsigned char?

Question

In C/C++, what an unsigned char is used for? How is it different from a regular char?

score 609 · Accepted Answer · edited Aug 11 '23 at 18:48

609

In C++, there are three distinct character types:

char
signed char
unsigned char

1. `char`

If you are using character types for text, use the unqualified char:

it is the type of character literals like 'a' or '0' (in C++ only, in C their type is int)
it is the type that makes up C strings like "abcde"

It also works out as a number value, but it is unspecified whether that value is treated as signed or unsigned. Beware character comparisons through inequalities - although if you limit yourself to ASCII (0-127) you're just about safe.

2. `signed char`/ 3. `unsigned char`

If you are using character types as numbers, use:

signed char, which gives you at least the -127 to 127 range. (-128 to 127 is common)
unsigned char, which gives you at least the 0 to 255 range. This might be useful for displaying an octet e.g. as hex value.

"At least", because the C++ standard only gives the minimum range of values that each numeric type is required to cover. sizeof (char) is required to be 1 (i.e. one byte), but a byte could in theory be for example 32 bits. sizeof would still be report its size as 1 - meaning that you could have sizeof (char) == sizeof (long) == 1.

edited Aug 11 '23 at 18:48

spaceKelan

67
7

answered Sep 17 '08 at 21:04

Fruny

6,453
1
17
10

5

To be clear, could you have 32-bit chars, and 32-bit integers, and have sizeof(int) != sizeof(char)? I know the standard says sizeof(char) == 1, but is the relative sizeof(int) based on actual difference in size or the difference in range? – Joseph Garvin Jan 11 '09 at 23:21
Joseph, the sizeof gives you the size of the object representation of the type. if you say 32bit int, that first doesn't tell much. most probably you mean the object representation (that's the physical size - including all padding bits). – Johannes Schaub - litb Jan 14 '09 at 06:26
if that's the case, then sizeof(int) != sizeof(char) can't be true, because char/unsigned/signed char use all bits of their object representation to represent their values (called the value representation) – Johannes Schaub - litb Jan 14 '09 at 06:27
The guaranteed range of `signed char` is -127 to 127, but assuming 2's complement you'll get -128 to 127. And that's a pretty safe assumption. – Steve Jessop May 16 '12 at 10:49
1

how come 1 byte can be 32 bits? – pseudonym_127 May 16 '13 at 06:19
22

+1. But there are four distinct character types in C++, wchar_t is one of them. – Eric Z Aug 24 '13 at 09:19
@Fruny I noticed that you wrote sizeof () with a space between, can you elaborate the use of that? At the moment, I am searching for answer on this. Thanks in advance. – Unheilig Jan 11 '14 at 20:23
18

since c++11 you have 6 distinct types: char, signed char, unsigned char, wchar_t, char16_t, char32_t. – marcinj Feb 16 '14 at 09:53
@pseudonym_127 good question. I think it's because technically size of a bit is unspecified (though commonly it's 8 bits). Hopefully someone else can verify this. – Celeritas Aug 09 '14 at 07:41
14

@unheilig It's common to place a space after `sizeof` because it is not a function but an operator. It is imho even better style to omit the parenthesis when taking the size of a variable. `sizeof *p` or `sizeof (int)`. This makes it clear quickly if it applies to a type or variable. Likewise, it is also redundant to put parenthesis after `return`. It's not a function. – Patrick Schlüter Nov 28 '14 at 12:00
3

"`char`: it is the type of character literals like `'a'` or `'0'`." is true in C++ but not C. In C, `'a'` is an `int`. – chux - Reinstate Monica May 10 '16 at 17:30
3

Just out of curiosity, you say "but a byte could in theory be for example 32 bits", but in reality, a byte is 8 bits. What am I missing? Thanks. – Brian Mar 28 '18 at 14:13
5

"byte" in this context reffers to the smallest addresable unit of memory. The C and C++ standards require a byte to be at least 8 bits, but they don't specify a maximum. On pretty much all general purpose computers today (including anything that is compliant with recent versions of posix) a byte is exactly 8 bits but specialised DSP platforms and retro systems may have larger bytes. – plugwash Mar 01 '19 at 18:48
Since C++20 you have 7 distinct types. `char` `signed char` `unsigned char` `wchar_t` `char8_t` `char16_t` `char_32_t` – 김선달 Jul 02 '21 at 07:00

score 105 · Answer 2 · edited Dec 02 '14 at 17:57

This is implementation dependent, as the C standard does NOT define the signed-ness of char. Depending on the platform, char may be signed or unsigned, so you need to explicitly ask for signed char or unsigned char if your implementation depends on it. Just use char if you intend to represent characters from strings, as this will match what your platform puts in the string.

The difference between signed char and unsigned char is as you'd expect. On most platforms, signed char will be an 8-bit two's complement number ranging from -128 to 127, and unsigned char will be an 8-bit unsigned integer (0 to 255). Note the standard does NOT require that char types have 8 bits, only that sizeof(char) return 1. You can get at the number of bits in a char with CHAR_BIT in limits.h. There are few if any platforms today where this will be something other than 8, though.

There is a nice summary of this issue here.

As others have mentioned since I posted this, you're better off using int8_t and uint8_t if you really want to represent small integers.

signed char have only a minimum range of -127 to 127, not from -128 to 127 — 12431234123412341234123, Jan 28 '17 at 06:40
@12431234123412341234123: Technically true, in that the C standard defines -127 to 127 as the minimum range. I challenge you to find a platform that doesn't use two's complement arithmetic, though. On nearly every modern platform, the actual range of signed chars will be -128 to 127. — Todd Gamblin, Feb 06 '17 at 07:55
`CHAR_BIT` is required to be at least 8 bits by the standard. — martinkunev, Mar 12 '19 at 16:31

score 45 · Answer 3 · edited Oct 10 '21 at 08:34

Because I feel it's really called for, I just want to state some rules of C and C++ (they are the same in this regard). First, all bits of unsigned char participate in determining the value if any unsigned char object. Second, unsigned char is explicitly stated unsigned.

Now, I had a discussion with someone about what happens when you convert the value -1 of type int to unsigned char. He refused the idea that the resulting unsigned char has all its bits set to 1, because he was worried about sign representation. But he didn't have to be. It's immediately following out of this rule that the conversion does what is intended:

If the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. (6.3.1.3p2 in a C99 draft)

That's a mathematical description. C++ describes it in terms of modulo calculus, which yields to the same rule. Anyway, what is not guaranteed is that all bits in the integer -1 are one before the conversion. So, what do we have so we can claim that the resulting unsigned char has all its CHAR_BIT bits turned to 1?

All bits participate in determining its value - that is, no padding bits occur in the object.
Adding only one time UCHAR_MAX+1 to -1 will yield a value in range, namely UCHAR_MAX

That's enough, actually! So whenever you want to have an unsigned char having all its bits one, you do

unsigned char c = (unsigned char)-1;

It also follows that a conversion is not just truncating higher order bits. The fortunate event for two's complement is that it is just a truncation there, but the same isn't necessarily true for other sign representations.

Because `(unsigned type)-1` is some kind of idiom. `~0` isn't. — Patrick Schlüter, Nov 28 '14 at 12:07
if i have something like this `int x = 1234` and `char *y = &x` . Binary representation of `1234 ` is `00000000 00000000 00000100 11010010` . My machine is little endian so it reverses it and store in memory `11010010 00000100 00000000 00000000` LSB comes first. Now Main Part . if i use `printf("%d" , *p)`. `printf` will read first byte `11010010`only the output is `-46` but `11010010` is `210` so why does it print `-46` . I am really confused i guess some char to integer promotion is doing something but i don't know. — Suraj Jain, Aug 17 '16 at 10:23

score 33 · Answer 4 · edited May 28 '20 at 15:11

As for example usages of unsigned char:

unsigned char is often used in computer graphics, which very often (though not always) assigns a single byte to each colour component. It is common to see an RGB (or RGBA) colour represented as 24 (or 32) bits, each an unsigned char. Since unsigned char values fall in the range [0,255], the values are typically interpreted as:

0 meaning a total lack of a given colour component.
255 meaning 100% of a given colour pigment.

So you would end up with RGB red as (255,0,0) -> (100% red, 0% green, 0% blue).

Why not use a signed char? Arithmetic and bit shifting becomes problematic. As explained already, a signed char's range is essentially shifted by -128. A very simple and naive (mostly unused) method for converting RGB to grayscale is to average all three colour components, but this runs into problems when the values of the colour components are negative. Red (255, 0, 0) averages to (85, 85, 85) when using unsigned char arithmetic. However, if the values were signed chars (127,-128,-128), we would end up with (-99, -99, -99), which would be (29, 29, 29) in our unsigned char space, which is incorrect.

I might be missing something but I don't follow how a fixed shift will break an arithmetic average. The average of 127, -128, and -128 is -43, not -99. If you add 128 to that you get 85 which is the same as your unsigned example. — Icydog, Sep 07 '21 at 08:16

James Hopkin · Answer 5 · 2015-12-07T15:09:56.063

13

signed char has range -128 to 127; unsigned char has range 0 to 255.

char will be equivalent to either signed char or unsigned char, depending on the compiler, but is a distinct type.

If you're using C-style strings, just use char. If you need to use chars for arithmetic (pretty rare), specify signed or unsigned explicitly for portability.

edited Dec 07 '15 at 15:09

answered Sep 16 '08 at 18:12

James Hopkin

13,797
1
42
71

how to convert "unsigned char *" to python bytes by use pybind11? – CS QGB Dec 16 '22 at 04:59

score 11 · Answer 6 · edited Nov 28 '14 at 11:50

11

unsigned char takes only positive values....like 0 to 255

where as

signed char takes both positive and negative values....like -128 to +127

edited Nov 28 '14 at 11:50

whoan

8,143
4
39
48

answered Jan 22 '13 at 10:41

munna

151
1
2

score 10 · Answer 7 · edited May 27 '20 at 00:58

10

An unsigned char is an unsigned byte value (0 to 255). You may be thinking of char in terms of being a "character" but it is really a numerical value. The regular char is signed, so you have 128 values, and these values map to characters using ASCII encoding. But in either case, what you are storing in memory is a byte value.

edited May 27 '20 at 00:58

PaSTE

4,050
18
26

answered Sep 16 '08 at 18:16

Zac Gochenour

253
1
7

"The regular char is signed": no, it's implementation dependent. And there's no guarantee that the range of values of an unsigned char is from 0 to 255: it's at least that, but it could be wider. – Fabio says Reinstate Monica Jun 20 '20 at 01:21
`char` is not guaranteed to be a byte. – qwr Jul 08 '20 at 19:40
@qwr `sizeof(char)` is guaranteed to be 1, as is `sizeof(signed char)` and `sizeof(unsigned char)`. So yes, a `char` is always exactly 1 byte. [Here's](https://stackoverflow.com/questions/9727465/will-a-char-always-always-always-have-8-bits) a supporting answer. A byte is not always exactly 8 bits (only at least 8 bits), hence the range of an `unsigned char` doesn't have to be 0 to 255. But that's an orthogonal discussion. – Alexander Guyer Jul 10 '23 at 14:54

score 10 · Answer 8 · edited Dec 28 '18 at 03:58

10

char and unsigned char aren't guaranteed to be 8-bit types on all platforms—they are guaranteed to be 8-bit or larger. Some platforms have 9-bit, 32-bit, or 64-bit bytes. However, the most common platforms today (Windows, Mac, Linux x86, etc.) have 8-bit bytes.

edited Dec 28 '18 at 03:58

Pang

9,564
146
81
122

answered Sep 17 '08 at 05:49

bk1e

23,871
6
54
65

score 9 · Answer 9 · edited Aug 16 '13 at 12:32

9

In terms of direct values a regular char is used when the values are known to be between CHAR_MIN and CHAR_MAX while an unsigned char provides double the range on the positive end. For example, if CHAR_BIT is 8, the range of regular char is only guaranteed to be [0, 127] (because it can be signed or unsigned) while unsigned char will be [0, 255] and signed char will be [-127, 127].

In terms of what it's used for, the standards allow objects of POD (plain old data) to be directly converted to an array of unsigned char. This allows you to examine the representation and bit patterns of the object. The same guarantee of safe type punning doesn't exist for char or signed char.

edited Aug 16 '13 at 12:32

Yu Hao

119,891
44
235
294

answered Sep 16 '08 at 18:17

Julienne Walker

179
2

Actually, it will most often be [-128, 128]. – RastaJedi Apr 24 '16 at 02:20
The standards only formally define the object representation as a **sequence** of `unsigned char`, not an _array_ specifically, & any "conversion" is only formally defined by **copying** from the object to a real, declared _array_ of `unsigned char` & then inspecting the latter. It's not clear whether the OR can be directly reinterpreted as such an array, with the allowances for pointer arithmetic it would entail, i.e. whether "sequence" `==` "array" in this usage. There's a Core Issue #1701 opened in hopes of getting this clarified. Thankfully, as this ambiguity is really bugging me recently. – underscore_d Aug 30 '16 at 12:49
2

@RastaJedi No, it won't. It can't. A range of -128...+128 is physically impossible to represent with 8 bits. That width only supports 2^8 == 256 discrete values, but -128...+128 = 2 * 128 + 1 for 0 = 257. Sign-magnitude representation permits -127...+127 but has 2 (bipolar) zeroes. Two's-complement representation maintains a single zero but makes up the range by having one more value on the negative side; it permits -128...+127. (And so on for both at larger bit widths.) – underscore_d Aug 30 '16 at 12:52
Re my 2nd comment, it's reasonable to _presume_ we can take a pointer to the 1st `unsigned char` of the OR and then proceed using `++ptr` from there to read every byte of it... but AFAICT, it's not specifically defined as being allowed, so we're left to infer that it's _'probably OK'_ from lots of other passages (and in many ways, the mere existence of `memcpy`) in the Standard, akin to a jigsaw puzzle. Which is not ideal. Well, maybe the wording will improve eventually. Here's the CWG issue I mentioned but lacked space to link - http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1701 – underscore_d Aug 30 '16 at 12:59
@underscore_d sorry, that was a typo. [-128, 127] is what I meant to type :p. Yeah, I know about the double zeros ('positive' and 'negative' zero) with sign/magnitude. I must have been tired :p. – RastaJedi Aug 30 '16 at 22:31

score 8 · Answer 10 · edited Oct 10 '21 at 08:37

8

unsigned char is the heart of all bit trickery. In almost all compilers for all platforms an unsigned char is simply a byte and an unsigned integer of (usually) 8 bits that can be treated as a small integer or a pack of bits.

In addition, as someone else has said, the standard doesn't define the sign of a char. So you have 3 distinct char types: char, signed char, unsigned char.

edited Oct 10 '21 at 08:37

Yun

3,056
6
9
28

answered Sep 16 '08 at 19:14

ugasoft

3,778
7
27
23

score 7 · Answer 11 · edited May 27 '20 at 02:59

7

If you like using various types of specific length and signedness, you're probably better off with uint8_t, int8_t, uint16_t, etc simply because they do exactly what they say.

edited May 27 '20 at 02:59

NAND

663
8
22

answered Sep 16 '08 at 18:18

Dark Shikari

7,941
4
26
38

score 4 · Answer 12 · edited May 28 '20 at 16:39

4

unsigned char takes only positive values: 0 to 255 while signed char takes positive and negative values: -128 to +127.

edited May 28 '20 at 16:39

NAND

663
8
22

answered Nov 24 '17 at 22:40

NL628

418
6
21

score 4 · Answer 13 · answered Sep 16 '08 at 18:16

Some googling found this, where people had a discussion about this.

An unsigned char is basically a single byte. So, you would use this if you need one byte of data (for example, maybe you want to use it to set flags on and off to be passed to a function, as is often done in the Windows API).

score 4 · Answer 14 · answered Sep 16 '08 at 18:20

An unsigned char uses the bit that is reserved for the sign of a regular char as another number. This changes the range to [0 - 255] as opposed to [-128 - 127].

Generally unsigned chars are used when you don't want a sign. This will make a difference when doing things like shifting bits (shift extends the sign) and other things when dealing with a char as a byte rather than using it as a number.

ZhaoGang · Answer 15 · 2017-07-21T03:27:53.130

quoted frome "the c programming laugage" book:

The qualifier signed or unsigned may be applied to char or any integer. unsigned numbers are always positive or zero, and obey the laws of arithmetic modulo 2^n, where n is the number of bits in the type. So, for instance, if chars are 8 bits, unsigned char variables have values between 0 and 255, while signed chars have values between -128 and 127 (in a two' s complement machine.) Whether plain chars are signed or unsigned is machine-dependent, but printable characters are always positive.

score 2 · Answer 16 · answered Jan 04 '20 at 04:30

signed char and unsigned char both represent 1byte, but they have different ranges.

   Type        |      range
-------------------------------
signed char    |  -128 to +127
unsigned char  |     0 to 255

In signed char if we consider char letter = 'A', 'A' is represent binary of 65 in ASCII/Unicode, If 65 can be stored, -65 also can be stored. There are no negative binary values in ASCII/Unicode there for no need to worry about negative values.

Example

#include <stdio.h>

int main()
{
    signed char char1 = 255;
    signed char char2 = -128;
    unsigned char char3 = 255;
    unsigned char char4 = -128;

    printf("Signed char(255) : %d\n",char1);
    printf("Unsigned char(255) : %d\n",char3);

    printf("\nSigned char(-128) : %d\n",char2);
    printf("Unsigned char(-128) : %d\n",char4);

    return 0;
}

Output -:

Signed char(255) : -1
Unsigned char(255) : 255

Signed char(-128) : -128
Unsigned char(-128) : 128

`char` is not guaranteed to be one byte and `signed char` is only guaranteed to hold range [-127,127] (though almost all systems use two's complement and hold at least [-128,127]) — qwr, Jul 08 '20 at 19:42
@awr According to the C standard, `char` is actually defined to be "1 byte", but a byte can be more than 8 bits. — Lover of Structure, Mar 02 '23 at 04:32

What is an unsigned char?

16 Answers16

1. `char`

2. `signed char`/ 3. `unsigned char`

Linked

Related

What is an unsigned char?

16 Answers16

1. char

2. signed char/ 3. unsigned char

Linked

Related

1. `char`

2. `signed char`/ 3. `unsigned char`