4

I have read about integer pointer subtraction in C in this thread: Pointer subtraction confusion, which was simple enough to grasp and test out.

However, I tried to replicate a similar scenario with a char* but the results that I get did not make much sense.

Here's the scenario that I tried:

#include <stdio.h>
#include <string.h>

int main() {

    char a_arr[16] = "";
    char *a = a_arr;
    char b_arr[1] = "";
    char *b = b_arr;

    printf("\nThe amount by which they differ is: %d\n", a-b);
    // a-b = 1, which makes sense since they are 1 char away

    return 0;
}

The next thing that I tried is what I'm having trouble understanding

#include <stdio.h>
#include <string.h>

int main() {

    char a_arr[16] = "";
    char *a = a_arr;
    char b_arr[2] = "";
    char *b = b_arr;

    printf("\nThe amount by which they differ is: %d\n", a-b);
    // a-b = 16, which doesn't really make sense to me..    

    return 0;
}

My guess is that there is some padding stuff going on from the compiler's end which I thought shouldn't be the case since it is a char array and there would be no need for alignment..

I'm not sure why it is 16 bytes.. Any help is much appreciated!

I have used the following online interface to compile and run this piece of code: http://www.tutorialspoint.com/compile_c_online.php

Community
  • 1
  • 1
rhino--
  • 51
  • 1
  • 5

4 Answers4

4

In this example, a_arr, a, b_arr, and b are probably all allocated on the stack. The compiler doesn't have to give you any particular guarantees about the arrangement of variables on the stack. So the compiler might be padding to multiples of 16 bytes, or might be introducing other data between a and b, or might be saving register values in between a and b, or ... .

That is why, as the commenters pointed out, the spec doesn't guarantee the results of subtracting pointers belonging to two different arrays. The good news is that you usually won't need to do so unless you are writing an OS or standard library :) .

Edit Also, the arrangement of memory, and what is kept in a register vs. on the stack, may change depending on your optimization level. I don't think that's probably a factor here, but it's something to keep in mind.

cxw
  • 16,685
  • 2
  • 45
  • 81
  • The "specs" is the C standard and that clearly disallows such subtractions (making them UB is the C-way to say "you are leaving the standard"). IOW, nasal daemons can appear, your house flay away or your computer shouting at you. – too honest for this site Sep 20 '16 at 13:58
1

Your compiler seems to be storing b first, then a in memory in your first example, and a first in the second. When I run them I get:

The amount by which they differ is: 1

and

The amount by which they differ is: 2

so my compiler is always storing b at a lower address than a.

What memory probably looks like for you:

First Example:
____________________
|B|        A       |
--------------------

Second Example:
______________________
|        A        |B |
----------------------

As the commenters pointed out, there is no guarantee of where the arrays will be located. Subtracting pointers in two different arrays is undefined behavior.

Riley
  • 698
  • 6
  • 11
1

I rewrote your program into something that will just dump memory. This should give you a better idea of what's laid out into memory where.

As others have pointed out, the compiler does not offer you guarantees about memory layout. Even inspecting a memory address can change how the compiler organizes its memory. Your question isn't so much one about C as it is about the quirks of your particular compiler.

#include <stdio.h>
#include <string.h>

int main()
{
    char a_arr[16] = "";
    char *a = a_arr;
    char b_arr[1] = "";
    char *b = b_arr;

    void *min, *max, *curr;


    min = &a_arr;
    if (min > (void *)&a) {
        min = &a;
    }
    if (min > (void *)&b_arr) {
        min = &b_arr;
    }
    if (min > (void *)&b) {
        min = &b;
    }

    max = (void *)&a_arr + sizeof(a_arr);
    if (max < (void *)&a + sizeof(a)) {
        max = (void *)&a + sizeof(a);
    }
    if (max < (void *)&b_arr + sizeof(b_arr)) {
        max = (void *)&b_arr + sizeof(b_arr);
    }
    if (max < (void *)&b + sizeof(b)) {
        max = (void *)&b + sizeof(b);
    }

    // Now print them.
    for (curr = min; curr <= max; ++curr) {
        if (curr == &a_arr)
            printf ("%10p: %10x - a_arr\n", curr, *((char *)curr));
        else if (curr == &a)
            printf ("%10p: %10x - a\n", curr, *((char *)curr));
        else if (curr == &b_arr)
            printf ("%10p: %10x - b_arr\n", curr, *((char *)curr));
        else if (curr == &b)
            printf ("%10p: %10x - b\n", curr, *((char *)curr));
        else
            printf ("%10p: %10x\n", curr, *((char *)curr));
    }

    printf ("\nThe amount by which they differ is: %d\n", a-b);

    return 0;
}

And here is how it runs on my machine. Note the three wasted bytes after b_arr. These bytes are being used to make each variable start on a address which is a multiple of 4 (this is known as word boundary alignment, and is pretty standard).

I suspect your compiler is aligning b_arr on a 16-byte boundary. This is unusual but not surprising. Compilers do weirdest things for speed.

Here's another question that nicely illustrates the unpredictable nature of memory alignment. In general, you just shouldn't treat memory layout as deterministic.

  ffbfefbc:   ffffffff - b
  ffbfefbd:   ffffffbf
  ffbfefbe:   ffffffef
  ffbfefbf:   ffffffc0
  ffbfefc0:          0 - b_arr
  ffbfefc1:          0
  ffbfefc2:          0
  ffbfefc3:          0
  ffbfefc4:   ffffffff - a
  ffbfefc5:   ffffffbf
  ffbfefc6:   ffffffef
  ffbfefc7:   ffffffc8
  ffbfefc8:          0 - a_arr
  ffbfefc9:          0
  ffbfefca:          0
  ffbfefcb:          0
  ffbfefcc:          0
  ffbfefcd:          0
  ffbfefce:          0
  ffbfefcf:          0
  ffbfefd0:          0
  ffbfefd1:          0
  ffbfefd2:          0
  ffbfefd3:          0
  ffbfefd4:          0
  ffbfefd5:          0
  ffbfefd6:          0
  ffbfefd7:          0
  ffbfefd8:          0

The amount by which they differ is: 8
Community
  • 1
  • 1
QuestionC
  • 10,006
  • 4
  • 26
  • 44
0

If you construct your test correctly, you will find that char pointer subtraction behaves exactly the same way as int pointer subtraction. I.e. the value returned is the number of chars between the two pointers and not the number of memory addresses (which may or may not be bytes) between them.

#include <stdio.h>
#include <string.h>

int main()
{
    char a_arr[16] = "";
    char *a = a_arr;
//    char b_arr[1] = "";
    char *b = &a_arr[8];

    printf("\nThe amount by which they differ is: %d\n", b-a);
    // b-a = 8, which makes sense since they are 8 chars away.

    printf("\nThe amount by which their addresses differ is: %d\n", (int)b-(int)a);
    // Which will depend on the implementation and may be something unexpected!

    return 0;
}

The microcontroller I'm using has 16 bit data bus and registers and, by default, stores the characters for strings at alternate (even) addresses. In this case the first output would be 8 and the second 16. There are compiler options to store the characters of strings in contiguous memory locations, but this is slower to access, as it involves shifting the 16 bit data registers to get at the odd addressed bytes.

Evil Dog Pie
  • 2,300
  • 2
  • 23
  • 46
  • Yes, the case that you mentioned works. Which is why I tested a similar case with 2 different arrays but I didn't know it was undefined behaviour. – rhino-- Sep 20 '16 at 14:31