1

I came across this program on HSW:

int *p;
int i;

p = (int *)malloc(sizeof(int[10]));
for (i=0; i<10; i++)
    *(p+i) = 0;
free(p);

I don't understand the loop fully. Assuming the memory is byte addressable, and each integer takes up 4 bytes of memory, and say we allocate 40 bytes of memory to the pointer p from address 0 to 39.
Now, from what I understand, the pointer p initially contains value 0, i.e. the address of first memory location. In the loop, a displacement is added to the pointer to access the subsequent integers.

I cannot understand how the memory addresses uptil 39 are accessed with a displacement value of only 0 to 9. I checked and found that the pointer is incremented in multiples of 4. How does this happen? I'm guessing it's because of the integer type pointer, and each pointer is supposedly incremented by the size of it's datatype. Is this true?

But what if I actually want to point to memory location 2 using an integer pointer. So, I do this: p = 2. Then, when I try to de-reference this pointer, should I expect a segmentation fault?

newacct
  • 119,665
  • 29
  • 163
  • 224
nimbudew
  • 958
  • 11
  • 28

5 Answers5

2

Since you have a typed pointer, when you perform common operations on it (addition or subtraction), it automatically adjusts the alignment for your type. Here, since on your computer sizeof (int) is 4, p + i will result in the address p + sizeof (int) * i, or p + 4*i in your case.

And you seem to misunderstand the statement *(p+i) = 0. This statement is equivalent to p[i] = 0. Obviously, your malloc() call won't return you 0, except if it fails to actually allocate the memory you asked.

Then, I assume that your last question means "If I shift my malloc-ated address by exactly two bytes, what will occur?".

The answer depends on what you do next and on the endianness of your system. For example:

/*
 * Suppose our pointer p is well declared
 * And points towards a zeroed 40 bytes area.
 * (here, I assume sizeof (int) = 4)
 */

int *p1 = (int *)((char *)p + 2);
*p1 = 0x01020304;
printf("p[0] = %x, p[1] = %x.\n", p[0], p[1]);

Will output

p[0] = 102, p[1] = 3040000.

On a big endian system, and

p[0] = 3040000, p[1] = 102

On a little endian system.

EDIT : To answer to your comment, if you try to dereference a randomly assigned pointer, here is what can happen:

  • You are lucky : the address you type correspond to a memory area which has been allocated for your program. Thus, it is a valid virtual address. You won't get a segfault, but if you modify it, it might corrupt the behavior of your program (and it surely will ...)
  • You are luckier : the address is invalid, you get a nice segfault that prevents your program from totally screwing things up.
RastaJedi
  • 641
  • 1
  • 6
  • 18
Rerito
  • 5,886
  • 21
  • 47
  • Okay. That explains the loop. But what if I explicitly point to some memory and try to access that? – nimbudew Feb 26 '13 at 13:41
  • He's only saying it will contain 0 because he is referring to the 40 bytes of memory as addresses `0 ... 39`. – RastaJedi Apr 18 '16 at 06:59
  • The endianness doesn't mean the entire value is reversed, just means that the order of the *bytes* is. Thus, it would actually be `304`, not `403`. Actually, since the first two bytes for `p` are zero, printing `p[0]` would actually print `3040000` since if it's laid out in memory as `0x00 00 04 03`, printing that as an int on little endian would put the `03` first, then the `04`, then the `00 00`, so it would look like `03040000` (`3040000` with leading zero removed). `p[1]` would look like `0x02 01 00 00` so printing that would show `00000102` (`102` with the leading zeros removed). – RastaJedi Apr 18 '16 at 09:10
  • So, on big endian, it would print `p[0] = 102, p[1] = 3040000`. – RastaJedi Apr 18 '16 at 09:15
  • You're right, I've recomputed it and found the same results @RastaJedi. Answer fixed :) – Rerito Apr 18 '16 at 09:22
  • On little endian, writing 0x01020304 to `p1` makes `p[0]` look like `00 00 04 03` in memory and `p[1]` look like `02 01 00 00` and printing those results in `03040000 (3040000)` and `00000102 (102)`, respectively. On big endian, `p[0]` would be `00 00 01 02`, printing that would show `102`, and `p[1]` would be `03 04 00 00` in memory and printing that would show `03040000 (3040000)`. – RastaJedi Apr 18 '16 at 09:32
  • It's not only the way it's read from memory that matters, but the way `p1` is written matters too. It's written as `04 03 02 01`, so because the first two bytes are still zero, that is why `p[0]` is laid out in memory as `00 00 04 03` and `p[1]` as `02 01 00 00`. This is why it becomes `3040000` and `102` on little endian systems. – RastaJedi Apr 18 '16 at 09:41
  • Yeah, I know it's just that I took the wrong line on my notes – Rerito Apr 18 '16 at 09:43
  • The way you are describing the little endian is as if you had done `int *p1 = htonl(0x1020304);`, i.e., you write it in big endian order, but then read it in little endian order. It would look exactly like this in this case. But the reality is that it is written in the other order as well, which is why we end up with `3040000` and `102`. – RastaJedi Apr 18 '16 at 09:50
  • That's weird, it didn't show you replied before my last comment. By the way, I think your little increment-by-2-bytes is a really neat way to see what's really going on under the hood. And I never knew/thought you could change the intervals for pointer arithmetic just with a simple cast. Never thought to try it. But I must say, your `(char *)p + 2` is a really neat trick. On a little endian system if you use `htonl(p1)` *and* print with it too, i.e., print `htonl(p[0])` and for `p[1]`, you can emulate how it would look on a big endian system, and it will show the expected `102` and `3040000`. – RastaJedi Apr 18 '16 at 10:01
  • @Rerito I suggested an edit to your answer with the correct values. – RastaJedi Apr 18 '16 at 10:22
2

Now, from what I understand, the pointer p initially contains value 0

No, the pointer p would not hold the value 0 in case malloc returns successfully.

At the point of declaring it, the pointer is uninitialized and most probably holds a garbage value. Once you assign it to the pointer returned by malloc, the pointer points to a region of dynamically allocated memory that the allocator sees as unoccupied.

I cannot understand how the memory addresses uptil 39 are accessed with a displacement value of only 0 to 9

The actual displacement values are 0, 4, 8, 12 ... 36. Because the pointer p has a type, in that case int *, this indicates that the applied offset in pointer arithmetics is sizeof(int), in your case 4. In other words, the displacement multiplier is always based on the size of the type that your pointer points to.

But what if I actually want to point to memory location 2 using an integer pointer. So, I do this: p = 2. Then, when I try to de-reference this pointer, should I expect a segmentation fault?

The exact location 2 will most probably be unavailable in the address space of your process because that part would either be reserved by the operating system, or will be protected in another form. So in that sense, yes, you will get a segmentation fault.

The general problem, however, with accessing a data type at locations not evenly divisible by its size is breaking the alignment requirements. Many architectures would insist that ints are accessed on a 4-byte boundary, and in that case your code will trigger an unaligned memory access which is technically undefined behaviour.

Blagovest Buyukliev
  • 42,498
  • 14
  • 94
  • 130
  • Okay, and should I expect a segmentation fault on accessing a memory location "in-between" two integers? – nimbudew Feb 26 '13 at 13:45
  • @jaskirat: It depends on how you treat that location. If you want to extract a single byte, you can do so at any location safely. If you want to extract a 4-byte integer at a location not evenly divisible by 4, you are technically invoking undefined behaviour because of breaking the alignment requirements. – Blagovest Buyukliev Feb 26 '13 at 13:48
  • In that particular case it should be `sizeof(int*)`. The size of a pointer is independent from the underlying type. – bash.d Feb 26 '13 at 13:49
  • But the pointer arithmetic is dependent on the underlying type. A `char` is one byte and a `char *` is often 4, but `char *s = "str"; s++;` will only increment `s` by one byte. – RastaJedi Apr 18 '16 at 07:25
2

Now, from what I understand, the pointer p initially contains value 0

No, it contains the address to the first integer in an array of 10. (Assuming that malloc was successful.)

In the loop, a displacement is added to the pointer to access the subsequent integers.

Umm no. I'm not sure what you mean but that is not what the code does.

I checked and found that the pointer is incremented in multiples of 4. How does this happen?

Pointer arithmetic, that is using + - ++ -- etc operators on a pointer, are smart enough to know the type. If you have an int pointer a write p++, then the address that is stored in p will get increased by sizeof(int) bytes.

But what if I actually want to point to memory location 2 using an integer pointer. So, I do this: p = 2.

No, don't do that, it doesn't make any sense. It sets the pointer to point at address 0x00000002 in memory.


Explanation of the code:

int *p; is a pointer to integer. By writing *p = something you change the contents of what p points to. By writing p = something you change the address of where p points.

p = (int *)malloc(sizeof(int[10])); was written by a confused programmer. It doesn't make any sense to cast the result of malloc in, you can find extensive information about that topic on this site.

Writing sizeof(int[10]) is the same as writing 10*sizeof(int).

*(p+i) = 0; is the very same as writing p[i] = 0;

I would fix the code as follows:

int *p = malloc(sizeof(int[10]));
if(p == NULL) { /* error handling */ }

for (int i=0; i<10; i++)
{
  p[i] = 0;
}

free(p);
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • "It doesn't make any sense to cast the result of malloc" It is required in C++. So whether it makes sense depends on where you are using this code. – newacct Feb 26 '13 at 21:52
  • @newacct This post is about the C language. Other programming languages certainly work differently, but to address them here would be completely off-topic. – Lundin Feb 27 '13 at 07:02
  • It does *make sense* to cast the return value of `malloc()`, it's just not required (or recommended). – RastaJedi Apr 18 '16 at 07:19
  • @RastaJedi The dead horse can be beaten [here](http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc). – Lundin Apr 18 '16 at 07:29
  • @Lundin no I absolutely agree with you and always reference that same question myself, but the fact of the matter is that it does *make sense* in that is well-defined to do so and not an error. – RastaJedi Apr 18 '16 at 07:48
1

It is called pointer arithmetic. Add an integer n to a pointer of type t* moves the pointer by n * sizeof(t) elements. Therefore, if sizeof(int) is 4 bytes:

p + 1 (C) == p + 1 * sizeof(int) == p + 1 * 4 == p + 4

Then it is easier to index your array:

*(p+i) is the i-th integer in the array p.

md5
  • 23,373
  • 3
  • 44
  • 93
0

I don't know if by "memory location 2" you mean your example memory address 2 or if you mean the 2nd value in your array. If you mean the 2nd value, that would be memory address 1. To get a pointer to this location you would do int *ptr = &p[1]; or equivalently int *ptr = p + 1;, then you can print this value with printf("%d\n", *ptr);. If you mean the memory address 2 (your example address), that would be the 3rd value in the array, then you'd want p[2] or p + 2. Note that memory addresses are usually in hex and wouldn't actually start at 0. It would be something like 0x092ef000, 0x092ef004, 0x092ef008, . . .. All of the other answers aren't understanding that you are using memory addresses 0 . . . 39 just as example addresses. I don't think you honestly are referring to the physical locations starting at address 0x00000000 and if you are then what everyone else is saying is right.

RastaJedi
  • 641
  • 1
  • 6
  • 18