15

While experimenting with methods for stepping through an array of strings in C, I developed the following small program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


typedef char* string;

int main() {
  char *family1[4] = {"father", "mother", "son", NULL};
  string family2[4] = {"father", "mother", "son", NULL};

  /* Loop #1: Using a simple pointer to step through "family1". */
  for (char **p = family1; *p != NULL; p++) {
    printf("%s\n", *p);
  }
  putchar('\n');

  /* Loop #2: Using the typedef for clarity and stepping through
   * family2. */
  for (string *s = family2; *s != NULL; s++) {
    printf("%s\n", *s);
  }
  putchar('\n');

  /* Loop #3: Again, we use the pointer, but with a unique increment
   * step in our for loop.  This fails to work.  Why? */
  for (string s = family2[0]; s != NULL; s = *(&s + 1)) {
    printf("%s\n", s);
  }
}

My specific question involves the failure of Loop #3. When run through the debugger, Loops #1 and #2 complete successfully, but the last loop fails for an unknown reason. I would not have asked this here, except for the fact that is shows me that I have some critical misunderstanding regarding the "&" operator.

My question (and current understanding) is this:

family2 is an array-of-pointer-to-char. Thus, when s is set to family2[0] we have a (char*) pointing to "father". Therefore, taking &s should give us the equivalent of family2, pointing to the first element of family2 after the expected pointer decay. Why doesn't, then, *(&s + 1) point to the next element, as expected?

Many thanks,
lifecrisis


EDIT -- Update and Lessons Learned:

The following list is a summary of all of the relevant facts and interpretations that explain why the third loop does not work like the first two.

  1. s is a separate variable holding a copy of the value (a pointer-to-char) from the variable family2[0]. I.e., these two equivalent values are positioned at SEPARATE locations in memory.
  2. family2[0] up to family2[3] are contiguous elements of memory, and s has no presence in this space, though it does contain the same value that is stored in family2[0] at the start of our loop.
  3. These first two facts mean that &s and &family2[0] are NOT equal. Thus, adding one to &s will return a pointer to unknown/undefined data, whereas adding one to &family2[0] will give you &family2[1], as desired.
  4. In addition, the update step in the third for loop doesn't actually result in s stepping forward in memory on each iteration. This is because &s is constant throughout all iterations of our loop. This is the cause of the observed infinite loop.

Thanks to EVERYONE for their help!
lifecrisis

lifecrisis
  • 346
  • 2
  • 12

3 Answers3

19

When you do s = *(&s + 1) the variable s is a local variable in an implicit scope that only contains the loop. When you do &s you get the address of that local variable, which is unrelated to any of the arrays.

The difference from the previous loop is that there s is a pointer to the first element in the array.


To explain it a little more "graphically" what you have in the last loop is something like

+----+      +---+      +------------+
| &s | ---> | s | ---> | family2[0] |
+----+      +---+      +------------+

That is, &s is pointing to s, and s is pointing to family2[0].

When you do &s + 1 you effectively have something like

+------------+
| family2[0] |
+------------+
^
|
+---+----
| s | ...
+---+----
^   ^
|   |
&s  &s + 1
Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
10

Pictures help a lot:

            +----------+
            | "father" |                                    
            +----------+         +----------+      +-------+      NULL 
   /-----------→1000            | "mother" |      | "son" |        ↑
+-----+           ↑              +----------+      +-------+        |
|  s  | ?         |                  2000            2500           |
+-----+           |                   ↑                ↑            |
 6000  6008 +----------------+----------------+--------------+--------------+
            |   family2[0]   |   family2[1]   |  family2[2]  |  family2[3]  |
            +----------------+----------------+--------------+--------------+
                  5000              5008            5016           5024

                    (    &s refers to 6000    ) 
                    ( &s+1 refers to 6008 but )
                    (   *(&s+1) invokes UB    )

Addresses chosen as random integers for simplicity


The thing here is that, although both s and family2[0] point to the same base address of the string literal "father", the pointers aren't related with each other and has its own different memory location where they are stored. *(&s+1) != family2[1].

You hit UB when you do *(&s + 1) because &s + 1 is a memory location you're not supposed to tamper with, i.e, it doesn't belong to any object you created. You never know what's stored in there => Undefined Behavior.

Thanks @2501 for pointing out several mistakes!

Community
  • 1
  • 1
Spikatrix
  • 20,225
  • 7
  • 37
  • 83
  • In the last part of your answer do you mean `&s + 1 != family2[1]`? Both statements are correct, but I'm just wondering which is more important to notice in this context... – lifecrisis Jan 10 '17 at 14:45
  • 1
    That's true as well but the types are different. `s+1 != family2[1]` is a comparison of two `char*`s while `&s+1 != family2[1]` is a comparison of a `char**` and a `char*` which doesn't really make sense. – Spikatrix Jan 10 '17 at 14:53
  • 2
    It isn't clear which example are you representing here. Regardless, this is wrong. &s+1 doesn't point to anything. It isn't even a valid object. – 2501 Jan 10 '17 at 15:10
  • It seems this is the third example. In that case, s+1 points to the second character of "father" and not 1007. – 2501 Jan 10 '17 at 15:15
  • Also family2[0] and s are using misleading arrows. They point to the same address, 1000, but s points to the left of the column instead of the address. – 2501 Jan 10 '17 at 15:18
  • And family2[3] doesn't point to address 3000, at which is shown to be an object containing the null address, family2[3] quite literally points to a null address. – 2501 Jan 10 '17 at 15:19
  • That''s a little better. There are still mistakes. &s doesn't have an address of 7000. Is not an object at all, as it is shown, so it doesn't have an address at all. &s+1 having the address 7008 is nonsense. How would adding to the value of an object change it's address !? &s+1 simply points to 6008, and it doesn't have an address. – 2501 Jan 11 '17 at 23:50
  • @2501 Thanks. Sorry for the late reply, I was busy yesterday. I was totally confused too! Thanks again! Is there any more problems in the answer? – Spikatrix Jan 12 '17 at 01:00
  • This looks corrects, expect that &s+1 is a valid pointer, you're just not allowed to dereference it. – 2501 Jan 12 '17 at 09:46
  • @2501 IIRC, using `&s+1` is illegal and invokes UB, even if you don't dereference it. – Spikatrix Jan 12 '17 at 12:02
  • Pointing one past an object is defined behavior. – 2501 Jan 12 '17 at 13:17
  • @2501 Hmm. I think that holds true only in the case of arrays. Is it true for any object? – Spikatrix Jan 12 '17 at 15:02
  • @2501 It seems I was wrong. You're right. Pointing to `&s+1` is indeed valid. Thanks! – Spikatrix Jan 26 '17 at 14:57
0
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


typedef char* string;

int main() {
char *family1[4] = { "father", "mother", "son", NULL };
string family2[4] = { "father", "mother", "son", NULL };

/* Loop #1: Using a simple pointer to step through "family1". */
for (char **p = family1; *p != NULL; p++) {
    printf("%s\n", *p);
}
putchar('\n');

/* Loop #2: Using the typedef for clarity and stepping through
* family2. */
for (string *s = family2; *s != NULL; s++) {
    printf("%s\n", *s);
}
putchar('\n');

/* Loop #3: Again, we use the pointer, but with a unique increment
* step in our for loop.  This fails to work.  Why? */
/*for (string s = family2[0]; s != NULL; s = *(&s + 1)) {
    printf("%s\n", s);
}
*/
for (int j = 0; j < 3; j++)
{
    printf("%d ",family2[j]);
    printf("%d\n", strlen(family2[j]));
}
printf("\n");
int i = 0;
for (string s = family2[i]; i != 3; s = (s + strlen(family2[i]) + 2),i++) {
    printf("%d ",s);
    printf("%s\n", s);
}

system("pause");

}

this is a example revised from your code,if you run it,you will find the change of the address of the point and the family2, then you will understand the relationship of the loop #3.

Jack
  • 11
  • 1