11

What is the use of unsigned char pointers? I have seen it at many places that pointer is type cast to pointer to unsigned char. Why do we do so?

We receive a pointer to int and then type cast it to unsigned char*. But if we try to print element in that array using cout it does not print anything. Why? I do not understand. I am new to C++.

EDIT Sample Code Below

int Stash::add(void* element)
{
    if(next >= quantity)
    // Enough space left?
        inflate(increment);
    
    // Copy element into storage, starting at next empty space:
    int startBytes = next * size; 
    unsigned char* e = (unsigned char*)element;
    for(int i = 0; i < size; i++)
        storage[startBytes + i] = e[i];
    next++;
    return(next - 1); // Index number
}
Josh Correia
  • 3,807
  • 3
  • 33
  • 50
Ankit_ceo2
  • 317
  • 1
  • 6
  • 14
  • when converting to a character pointer, the first byte is probably zero which is the same as the string terminator, and so nothing will be printed. It would help more if you could show what you really do, i.e. post some code. Please make an [SSCCE](http://sscce.org/) and add to the question. – Some programmer dude Feb 08 '13 at 09:59
  • But I think that would loose the information if first byte is zero and actually I am trying to print all the four bytes but it is not printing anything. – Ankit_ceo2 Feb 08 '13 at 10:02
  • 2
    Your question seems more about "why" rather than "when". Very often, `unsigned char *` is used as a byte-level access method for reaching into a variable or memory address of an otherwise more-formal type. It has many niceties, among them, immunity to strict aliasing rules and standard-guaranteed alignment with any address you throw at it. New to C++ shouldn't make this difficult if you're reasonably familiar with C. New to *programming*, i see this as being a challenge to comprehend. Perhaps you have some code and an idea behind it you have questions about? – WhozCraig Feb 08 '13 at 10:03
  • I moved your code up into your question. Any comments about it you can post in the question, or responses to others comments, can be posted here. – WhozCraig Feb 08 '13 at 10:13

4 Answers4

11

You are actually looking for pointer arithmetic:

unsigned char* bytes = (unsigned char*)ptr;
for(int i = 0; i < size; i++)
    // work with bytes[i]

In this example, bytes[i] is equal to *(bytes + i) and it is used to access the memory on the address: bytes + (i* sizeof(*bytes)). In other words: If you have int* intPtr and you try to access intPtr[1], you are actually accessing the integer stored at bytes: 4 to 7:

0  1  2  3
4  5  6  7 <-- 

The size of type your pointer points to affects where it points after it is incremented / decremented. So if you want to iterate your data byte by byte, you need to have a pointer to type of size 1 byte (that's why unsigned char*).


unsigned char is usually used for holding binary data where 0 is valid value and still part of your data. While working with "naked" unsigned char* you'll probably have to hold the length of your buffer.

char is usually used for holding characters representing string and 0 is equal to '\0' (terminating character). If your buffer of characters is always terminated with '\0', you don't need to know it's length because terminating character exactly specifies the end of your data.

Note that in both of these cases it's better to use some object that hides the internal representation of your data and will take care of memory management for you (see RAII idiom). So it's much better idea to use either std::vector<unsigned char> (for binary data) or std::string (for string).

LihO
  • 41,190
  • 11
  • 99
  • 167
  • sometimes i see they combine `unsigned char*` to another sizeof struct like this: `return (unsigned char *)data + sizeof(Header);` (data is void pointer) , is that to to calculate length of void pointer with size of header? – TomSawyer Apr 28 '20 at 14:59
8

In C, unsigned char is the only type guaranteed to have no trapping values, and which guarantees copying will result in an exact bitwise image. (C++ extends this guarantee to char as well.) For this reason, it is traditionally used for "raw memory" (e.g. the semantics of memcpy are defined in terms of unsigned char).

In addition, unsigned integral types in general are used when bitwise operations (&, |, >> etc.) are going to be used. unsigned char is the smallest unsigned integral type, and may be used when manipulating arrays of small values on which bitwise operations are used. Occasionally, it's also used because one needs the modulo behavior in case of overflow, although this is more frequent with larger types (e.g. when calculating a hash value). Both of these reasons apply to unsigned types in general; unsigned char will normally only be used for them when there is a need to reduce memory use.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • 1
    "C++ extends this guarantee to `char` as well." — Can we have a source for this? – Emil Laine Jul 15 '15 at 02:51
  • @emlai It's self-evident / easily provable. https://stackoverflow.com/a/24052128/1874170 If you wanted, you could corrupt memory and iterate over all 2^8 possible values (guaranteed comprehensive due to `sizeof(char)`) and prove it for yourself. – JamesTheAwesomeDude Mar 23 '18 at 19:14
4

The unsinged char type is usually used as a representation of a single byte of binary data. Thus, and array is often used as a binary data buffer, where each element is a singe byte.

The unsigned char* construct will be a pointer to the binary data buffer (or its 1st element).

I am not 100% sure what does c++ standard precisely says about size of unsigned char, whether it is fixed to be 8 bit or not. Usually it is. I will try to find and post it.

After seeing your code

When you use something like void* input as a parameter of a function, you deliberately strip down information about inputs original type. This is very strong suggestion that the input will be treated in very general manner. I.e. as a arbitrary string of bytes. int* input on the other hand would suggest it will be treated as a "string" of singed integers.

void* is mostly used in cases when input gets encoded, or treated bit/byte wise for whatever reason, since you cannot draw conclusions about its contents.

Then In your function you seem to want to treat the input as a string of bytes. But to operate on objects, e.g. performing operator= (assignment) the compiler needs to know what to do. Since you declare input as void* assignment such as *input = something would have no sense because *input is of void type. To make compiler to treat input elements as the "smallest raw memory pieces" you cast it to the appropriate type which is unsigned int.

The cout probably did not work because of wrong or unintended type conversion. char* is considered a null terminated string and it is easy to confuse singed and unsigned versionin code. If you pass unsinged char* to ostream::operator<< as a char* it will treat and expect the byte input as normal ASCII characters, where 0 is meant to be end of string not an integer value of 0. When you want to print contents of memory it is best to explicitly cast pointers.

Also note that to print memory contents of a buffer you would need to use a loop, since other wise the printing function would not know when to stop.

luk32
  • 15,812
  • 38
  • 62
  • 1
    C and C++ define character types (`char`, `unsigned char` and `signed char`) to have a size of one byte, and require them to have at least 8 bits. There is, or at least until recently was a machine with 9 bit `char`, and there are some with 32 bit char. (Historically, of course, there were a lot of machines with bytes less than 8 bits, but C doesn't allow this.) – James Kanze Feb 08 '13 at 10:35
  • @James, thank you. I mentioned it, 'cause I remember something about not being guaranteed that it is always 8bits. I wanted to stay clear in case one would be implementing some low-level network protocols or move binary files from a system to system, they might encounter such caveats. – luk32 Feb 08 '13 at 10:53
  • 1
    A lot depends on how portable you have to be. For most people, the portability constraints will be loose enough to allow the assumption that `char` is 8 bits, but there _are_ machines where it isn't. – James Kanze Feb 08 '13 at 11:02
0

Unsigned char pointers are useful when you want to access the data byte by byte. For example, a function that copies data from one area to another could need this:

void memcpy (unsigned char* dest, unsigned char* source, unsigned count)
{
    for (unsigned i = 0; i < count; i++)
        dest[i] = source[i];
}

It also has to do with the fact that the byte is the smallest addressable unit of memory. If you want to read anything smaller than a byte from memory, you need to get the byte that contains that information, and then select the information using bit operations.

You could very well copy the data in the above function using a int pointer, but that would copy chunks of 4 bytes, which may not be the correct behavior in some situations.

Why nothing appears on the screen when you try to use cout, the most likely explanation is that the data starts with a zero character, which in C++ marks the end of a string of characters.

Tibi
  • 4,015
  • 8
  • 40
  • 64
  • If it starts with 0 character still it should print the value of the other 3 characters. And if in the for loop in the code for(int i = 0; i < size; i++) cout<< e[i]; //////does not print anything storage[startBytes + i] = e[i]; and If i make it as cout<< *(int*)e[i]; in the code above it prints the value in the first iteration and then 3 garbage values are printed. – Ankit_ceo2 Feb 08 '13 at 10:20
  • 1
    "You could very well copy the data in the above function using a `int` pointer" No, you very well could _not_! Types except `unsigned char` (& I think _especially_ signed types), are not guaranteed to (A) cover all bits of the underlying memory or (B) allow the trapping/invalid values that might result from trying to reinterpret arbitrary bytes as `int`s. Using any pointer other than `unsigned char *` here is inherently, & very, non-portable. Implementations may use it as a platform-dependent detail, but users shouldn't. – underscore_d Aug 18 '16 at 11:29