66

The manpage says about memset:

#include <string.h>
void *memset(void *s, int c, size_t n)

The memset() function fills the first n bytes of the memory area pointed to by s with the constant byte c.

It is obvious that memset can't be used to initialize int array as shown below:

int a[10];
memset(a, 1, sizeof(a));  

it is because int is represented by 4 bytes (say) and one can not get the desired value for the integers in array a.
But I often see the programmers use memset to set the int array elements to either 0 or -1.

int a[10];
int b[10];
memset(a, 0, sizeof(a));  
memset(b, -1, sizeof(b));  

As per my understanding, initializing with integer 0 is OK because 0 can be represented in 1 byte (may be I am wrong in this context). But how is it possible to initialize b with -1 (a 4 bytes value)?

haccks
  • 104,019
  • 25
  • 176
  • 264
  • You are slightly wrong about the reason initializing with `0` is OK. It is OK because `0` fits in an `unsigned char` (so it is not truncated when used as the second argument to `memset`) *and* because the bit pattern in memory for a `sizeof(int)`-byte zero is identical to the bit pattern in memory for `sizeof(int)` sequential one-byte zeros. Both of those things must be true for this to work. In fact, those things are true for exactly two numbers in twos-complement arithmetic: `0` and `-1`. – zwol Jul 22 '15 at 00:36
  • @zwol: Hmm? The first sentence speaks of zeros and so is not literally true for −1. So presumably you intend to implicitly parameterize the first sentence: It works for *x* if the bits for an `int` with value *x* are the same as the bits for `sizeof(int)` `unsigned char` each with the value *x*. Further, we must consider the `unsigned char` with value *x* as resulting from conversion of *x* to `unsigned char`, as −1 is not representable. If so, then it is not true that 0 and −1 are the only such values. 16,843,009 • *x* works for any integer 0 ≤ *x* < 256. (16,843,009 is hex 1010101). – Eric Postpischil Oct 27 '20 at 17:45
  • @zwol: Except for the fact that C does not require the bit positions in integers of different widths to represent the same values. – Eric Postpischil Oct 27 '20 at 17:48
  • @EricPostpischil I don't understand your example. No multiple of 16,843,009 is representable by any of the `char` types (well, unless you're on a machine where `CHAR_BIT >= 25`.) – zwol Oct 27 '20 at 18:12
  • @zwol: `0x34343434` is a multiple of 16,843,009; it is `0x34 * 0x01010101`. `int a; memset(&a, 0x34343434, sizeof a);` will set each byte of `a` to `0x34`. Then the value of `a` will be `0x34343434`. – Eric Postpischil Oct 27 '20 at 18:47
  • @EricPostpischil Oh, you're relying on internal truncation of the second argument to `memset`. I consider that to be cheating because `memset` would take an `unsigned char` second argument if not for back-compat with traditional C. – zwol Oct 27 '20 at 19:06
  • @zwol: But you did that with −1. `memset` takes an `int`, converts it to an `unsigned char`, and copies it into each byte. −1 is not representable as an `unsigned char`; it gets converted to `UCHAR_MAX`. So, if you are allowing that, then `0x34343434` (or a similar value in case of larger-byte C implementations) works the same way. – Eric Postpischil Oct 27 '20 at 19:13
  • @EricPostpischil Converting `(int)-1` to `signed char` does not change its value; converting 0x34343434 to any form of `char` does change its value (when `CHAR_BIT` has its usual value). I'm not sure why you're banging on unsigned vs. signed; the question was clearly about signed quantities. – zwol Oct 27 '20 at 19:21
  • 1
    @zwol: `memset` is defined in terms of `unsigned char`. There is no `signed char` or `char` in either the posted question or in the C specification of `memset`. – Eric Postpischil Oct 27 '20 at 19:34

2 Answers2

74

Oddly, the reason this works with -1 is exactly the same as the reason that this works with zeros: in two's complement binary representation, -1 has 1s in all its bits, regardless of the size of the integer, so filling in a region with bytes filled with all 1s produces a region of -1 signed ints, longs, and shorts on two's complement hardware.

On hardware that differs from two's complement the result will be different. The -1 integer constant would be converted to an unsigned char of all ones, because the standard is specific on how the conversion has to be performed. However, a region of bytes with all their bits set to 1 would be interpreted as integral values in accordance with the rules of the platform. For example, on sign-and-magnitude hardware all elements of your array would contain the smallest negative value of the corresponding type.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • 17
    Wouldn't using `~0` be effectively the same (and more clear)? – Fiddling Bits Jun 13 '14 at 14:46
  • 2
    @FiddlingBits Yes, using `~0` would definitely avoid the confusion here. – Sergey Kalinichenko Jun 13 '14 at 14:50
  • I got your answer but could you explain this line: *so filling in a region with -1 bytes produces a region of -1 signed ints, longs, and shorts.*? – haccks Jun 13 '14 at 14:58
  • Just have a look a the binary representation of an integer carring `-1`. @haccks – alk Jun 13 '14 at 15:03
  • 2
    @haccks If you fill with all-ones bit pattern a region of memory that corresponds to a `sizeof` of some integral type (`int`, `long`, or `short`) and then re-interpret that region as the corresponding integral type, you would see `-1` on computers with two-s complement representation. Note that on rare occasions when you have a sign-magnitude hardware you would see the smallest negative integer representable on that hardware (I have never seen such hardware, or even a person who mentioned seeing such hardware, but I heard that it does exist). – Sergey Kalinichenko Jun 13 '14 at 15:08
  • @dasblinkenlight; Yes. Actually this confuses me: *filling in a region with `-1`*. I thought it there would be `1` intead of `-1`. – haccks Jun 13 '14 at 15:14
  • 1
    @haccks You are right, using `-1` as a "shortcut" for a "byte filled with all ones" is platform-specific. I added some clarifications to this, along with the long comment above. Thanks! – Sergey Kalinichenko Jun 13 '14 at 15:23
  • @dasblinkenlight Which is more confusing depends on the background of the reader. I once confused a colleague by using `~0`, and in the following discussion it became clear that `-1` would not have been confusing to that colleague. I think there are quite a few C/C++ programmers who don't even know that there is something like the `~` operator. – cmaster - reinstate monica Jun 13 '14 at 18:19
  • "On hardware that differs from two's complement... -1 integer constant would be converted to an unsigned char of all ones". Say on a 32-bit sign-magnitude machine, would not `-1` --> `0x80000001` and then truncate to 8-bits as `0x01` into each byte resulting in `0x01010101`? I'd expect `~0` to covert to "all ones", but not necessarily `-1`. – chux - Reinstate Monica Jun 13 '14 at 18:35
  • 2
    @chux The standard says that in order to convert a negative integral value to `unsigned` the compiler must subtract the magnitude of the negative value from the 2^N, where N is the number of bits in the unsigned integral type. Here, N is 8, so the result is 256-1=255, an unsigned value. This is how they avoided making the process implementation-defined without requiring 2s complement representation. That's why my understanding is that `-1` would be converted to an all-ones bit pattern regardless of the way the negatives are represented on the target platform. – Sergey Kalinichenko Jun 13 '14 at 18:44
  • @dasblinkenlight I see. Although I think the conversion is `-1` to `size_t` (where bit-width may be 32) and then `memset()` uses `(unsigned char)` bits of `size_t n`. Either way, comes up with all ones. Need to remember all this should one encounter such a machine. ;-) – chux - Reinstate Monica Jun 13 '14 at 18:58
  • +1 for the comment that the standard actually specifies signed-unsigned conversion. I always thought that was platform specific. – SztupY Jun 13 '14 at 20:05
  • 1
    @dasblinkenlight Another storage method is BCD, such as is found on anything which is based on the [IBM 360 architecture](http://en.wikipedia.org/wiki/IBM_System/360_architecture), in which -1 would be represented in 1 byte as `x'1D'`, in 2 bytes as `x'00 1D'`, and with further leading zeroes as required after that. I'm struggling to see how C could validly be implemented with that as the default type for `int`, but I think it serves to illustrate an actual instance of a non-twos-complement architecture. – ClickRick Jun 14 '14 at 02:20
  • @FiddlingBits: Couldn’t `~0` theoretically produce a trap representation, and thus not be equivalent to `-1`? – Eric Postpischil Dec 20 '18 at 09:45
  • 1
    Wow, I hadn't realized `memset` on non-2's-complement machines would have to actually do int->unsigned conversion (because it's not a no-op there). I guess it dates back to such early C history (probably before prototypes) that it couldn't have been declared to take `unsigned` or `unsigned char`, but it's weird that in the asm the caller doesn't necessarily pass the bit-pattern it wants. – Peter Cordes May 02 '21 at 18:42
8

When all bits of a number are 0, its value is also 0. However, if all bits are 1 the value is -1.

If we write int a[2], 4x2 bytes of memory is allocated which contains random/garbage bits-

00110000 00100101 11100011 11110010    11110101 10001001 00111000 00010001

Then, we write memset(a, 0, sizeof(a)). Now, memset() works byte by byte, and one byte representation (unsigned char) of 0 is 00000000. So, it becomes-

00000000 00000000 00000000 00000000    00000000 00000000 00000000 00000000

Therefore, both a[0] and a[1] are initialized with 0.


Now, lets see memset(a, -1, sizeof(a)): one byte for -1 is 11111111. And, we get-

11111111 11111111 11111111 11111111    11111111 11111111 11111111 11111111

Here, both a[0] and a[1] will have the value -1.


However, for memset(a, 1, sizeof(a)): 1 in a byte is 00000001-

00000001 00000001 00000001 00000001    00000001 00000001 00000001 00000001

So, the value will be- 16843009.

Minhas Kamal
  • 20,752
  • 7
  • 62
  • 64
  • `void *memset( void *dest, int ch, size_t count );` => Copies the value `ch` (after conversion to `unsigned char`) into each of the first count characters of the object pointed to by `dest`. – Minhas Kamal Oct 27 '20 at 18:23