1

I have been working on trying to write a function that does string comparison for a generic binary search function. However, while writing the function, I realized that my pointer dereferencing does not work.

In essence, this is what doesn't work:

printf("***a[0] = %c\n", (*(char **)(void *)&"a")[0]);

I ran the debugger which tells me EXC_BAD_ACCESS (code=EXC_I386_GPFLT)

However, this extremely similar code (which I believe to be identical to my previous code) does work.

char * stringa = "a";
printf("***stringa[0] = %c\n", (*(char **)(void *)&stringa)[0]);

I don't understand why the second one works but the first one doesn't. My understanding is that both "a" and stringa both represent the memory address of the beginning of a character array.

Thank you in advance.

jho317
  • 11
  • 4
  • Arrays and pointers are not the same. Arrays decay to pointers in some contexts, but this isn't one of them. – Barmar Dec 22 '21 at 06:41
  • Why not `((char*)(void *)&"hello")[0]`, or simply, `((char*)&"hello")[0]`? – Sourav Ghosh Dec 22 '21 at 06:46
  • I am trying to write a generic binary search so my function that compares two strings should get the pointers of two arrays as void*s – jho317 Dec 22 '21 at 06:48
  • int StrCmp2(void * vp1, void * vp2) { printf("Entered StrCmp2"); int difference = 0; char * str1 = * (char **) vp1; char * str2 = * (char **) vp2; int i1 = 0; int i2 = 0; printf("!!!str1[0] = %c", str1[0]); char char1 = str1[i1]; char char2 = str2[i2]; } this is how my code starts – jho317 Dec 22 '21 at 06:48
  • This code starts to break at line char1 = str1[i1]; I have checked with a debugger. – jho317 Dec 22 '21 at 06:49
  • @Barmar What do you mean by saying "Array decay to pointers in some contexts but not others"? The only way to refer to the values of an array is by doing pointer arithmetic and this is possible because arrays are represented as the pointer to the first value of the array. – jho317 Dec 22 '21 at 06:57
  • @SouravGhosh I am trying to write a generic binary search so my function that compares two strings should get the pointers of two arrays as void*s – jho317 Dec 22 '21 at 06:58
  • @Barmar Is it the first situation in this post: https://stackoverflow.com/questions/17752978/exceptions-to-array-decaying-into-a-pointer 1. when it's the argument of the & (address-of) operator. ? – jho317 Dec 22 '21 at 07:12
  • @Barmar But still I don't understand. int StrCmp(void * vp1, void * vp2) { char * s1 = * (char **) vp1; char * s2 = * (char **) vp2; return strcmp(s1, s2); } This code works where the inputs are &"some_array". This tells me that I do get the correct memory address – jho317 Dec 22 '21 at 07:16
  • @Bob__ strcmp(aa, a) returns the same result as strcmp(aaa, a) and strcmp(aab, a). This doesn't seem to be properly comparing strings. – jho317 Dec 22 '21 at 23:32
  • @jho317 [Good point](https://godbolt.org/z/W3fE7ne7j), I stand corrected. – Bob__ Dec 23 '21 at 00:09

2 Answers2

1

Pointers are not arrays. Arrays are not pointers.

  • &stringa results in a pointer to pointer of type char**.
  • &"a" results in an array pointer of type char(*)[2]. It is not compatible with char**.

You try to de-reference the char(*)[2] by treating it as a char** which won't work - they are not compatible types and in practice the actual array pointer is saying "at address x there is data" but when converting it you say "at address x there is a pointer".

If you try to print printf("%p\n", *(char **)(void *)&"a"); you don't get an address but data. I get something like <garbage> 0061 which is a little endian machine trying to convert the string into a larger integer number. In memory you'll have 0x61 ('a')then 0x00 (null term) - the string itself, not an address which you can de-reference.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • It's very weird that &"a" and "a" work the same. – jho317 Dec 22 '21 at 07:31
  • @jho317 They don't. `"a"` gives you the array which when used in most expressions decay into a pointer to its first element (`char*`). This decay does not happen in `&` expressions though, so you get an array pointer (`char(*)[2]`). – Lundin Dec 22 '21 at 07:43
0

First, check this rule - from C11 Standard#6.3.2.1p3 [emphasis added]:

3 Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ''array of type'' is converted to an expression with type ''pointer to type'' that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.

From String literals [emphasis added]:

Constructs an unnamed object of specified character array type in-place, used when a character string needs to be embedded in source code.

Lets decode this first:

char * stringa = "a";
printf("***stringa[0] = %c\n", (*(char **)(void *)&stringa)[0]);

In this statement char * stringa = "a";, string "a" will convert to pointer to type char that points to the initial element of the string "a". So, after initialisation, stringa will point to first element of string literal "a".

&stringa is of type char **. Dereferencing it will give char * type which is nothing but string "a" and applying [0] to it will give character 'a'.

Now, lets decode this:

printf("***a[0] = %c\n", (*(char **)(void *)&"a")[0]);

Since, here you are giving unary operator & so, in this expression, (*(char **)(void *)&"a")[0], string "a" will not convert to pointer to its initial element and &"a" will give the pointer of type const char (*)[2] and that pointer will be type casted to char ** type.

Dereferencing this pointer will give value at address which is nothing but string "a", which it will think of as a pointer of type char (because of type casting char **) and applying [0] to it. That means, it's trying to do something like this ((char *)0x0000000000000061)[0] (0x61 is hex value of character 'a') which is resulting in the error EXC_BAD_ACCESS.

Instead, you should do

printf("***a[0] = %c\n", (*(const char (*)[2])(void *)&"a")[0]);

EDIT:

OP is still confused. This edit is an attempt to explain the expressions (above in the post) in a different way.

From comments:
OP: But you wrote ((const char ()[2])(void )&"a")[0] works! There are two dereferencing operations ( and [0]) going on here!

Not sure if you aware of it or not but, I think, it's good to share definition of [] operator, from C11 Standard#6.5.2.1p2:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).

Expression (*(char **)(void *)&stringa)[0]:

     (*(char **)(void *)&stringa)[0]
      | |                      |
      | +----------------------+
      |             |
      |       this will result in 
      |       type casting a pointer 
      |       of type char ** to char **
      |
      |
   This dereferencing 
   will be applied on result of &stringa
   i.e. ( * ( &stringa ) )
   and result in stringa
   i.e. this
         |
         |
         |   &stringa  (its type is char **)
         |      +-------+
         |      |  800  |---+
         |      +-------+   |
         |                  |
         +-------> stringa  |
                /       +-------+  (pointer stringa pointing to first char of string "a"
               /        |  200  |---+     (type of stringa is char *)
              |         +-------+   |
       now apply [0]    800         |
       to it                        |
       i.e. stringa[0].            +-------+
       stringa[0] is           +-> | a | 0 |   (string literal - "a")
       equivalent to           |   +-------+
       *((stringa) + (0))      |   200        ---> address of "a"
       i.e.                    |
       *(200 + 0),             |            
       add 0 to address 200    |
       and dereference it.     |
       *(200 + 0) => *(200)    |
       dereferencing address   |
       200 will result in      |
       value at that address   |
       which is character      |
       'a', that means,        |
       *(200) result in -------+

Expression (*(char **)(void *)&"a")[0]:

 (*(char **)(void *)&"a")[0]
  | |                  |
  | +------------------+
  |          |
  |       this will result in 
  |       type casting a pointer 
  |       of type const char (*)[2] to char **
  |       
  |
 this dereferencing will be
 applied to pointer of type 
 char ** which is actually a
 pointer of type char (*)[2]
 i.e. *(&"a").
 It will result in value at address 200
 which is nothing but string "a"
 but since we are type casting 
 &"a" with double pointer (char **)
 so single dereference result 
 will be considered as pointer of 
 type char i.e. char *.
    *(char **)(void *)&"a"
          |
          |
          |            &"a" (its type is const char (*)[2] because type of "a" is
          |             +-------+      const char [2] i.e. array of 2 characters)
          |             |  200  |---+
          |             +-------+   |
          |                         |
          |                         |
          |                         |
          |                       +-------+
          +------------------>    | a | 0 |   (string literal - "a")
                            /     +-------+
                           /      200        ---> address of "a"
                          |
                          |
                The content at this location will be 
                treated as pointer (of type char *)
                i.e. the hex of "a" (0x0061) [because the string has character `a` followed by null character]
                will be treated as pointer.
                Applying [0] to this pointer
                i.e. (0x0061)[0], which is 
                equivalent to (* ((0x0061) + 0)).
                (* ((0x0061) + 0)) => *(0x0061)
                i.e. trying to dereference 0x0061
                Hence, resulting in bad access error.

Expression (*(const char (*)[2])(void *)&"a")[0]:

 (*(const char (*)[2])(void *)&"a")[0]
  | |                            |
  | +----------------------------+
  |          |
  |       this will result in
  |       type casting a pointer
  |       of type const char (*)[2] to const char (*)[2]
  |
  |
 this dereferencing will be
 applied to pointer of type
 const char (*)[2]
 i.e. *(&"a")
 and result string "a"
 whose type is const char [2]
          |
          |
          |            &"a" (its type is const char (*)[2] because type of "a" is
          |             +-------+      const char [2] i.e. array of 2 characters)
          |             |  200  |---+
          |             +-------+   |
          |                         |
          |                         |
          |                         |
          |                       +-------+
          +------------------>    | a | 0 |   (string literal - "a")
                            /     +-------+
                           /      200        ---> address of "a"
                          |
                          |
                Apply [0] to "a"
                i.e. "a"[0].
                Now, scroll to the top of my post
                and check string literal definition -

                string literal constructs unnamed object of character array type.....

                also, read rule 6.3.2.1p3 
                (which is applicable for an array of type) -

                ....an expression that has type 'array of type' is converted   
                to an expression with type 'pointer to type' that points to   
                the initial element of the array object. ....
                
                So, "a" (in expression "a"[0]) will be converted to pointer 
                to initial element i.e. pointer to character `a` which is
                nothing but address 200.
                "a"[0]  ->  (* ((a) + (0))) -> (* ((200) + (0)))
                -> (* (200)) -> 'a'

From comments:
OP: there is no such thing as an object in C ....

Don't confuse word object with objects in C++ or other object oriented languages.
This is how C standard defines an object:

From C11 Standard#3.15p1

1 object
region of data storage in the execution environment, the contents of which can represent values

E.g. - int x; --> x is an object of type int.

Let me know, if you have any more question.

H.S.
  • 11,654
  • 2
  • 15
  • 32
  • Can you explain what const char (*)[2] is? If this is not a pointer to a pointer, then why does dereferencing and then doing [0] (which is an implicit pointer arithmetic and dereferencing) result in the first char in the array? – jho317 Dec 22 '21 at 08:27
  • @jho317 String literal `"a"` will constructs an unnamed object of character array type, something like (just for illustration) `char arr[2] = {'a', '\0'}`. The type of `arr` is `char [2]` and the type of `&arr` is `char (*)[2]`. Hence, the type of `&a` is `const char (*)[2]` (`const` modifier because its string literal). "_.....then why does dereferencing and then doing [0] result in the first char in the array?_", only dereferencing will result in first character of array and not "_dereferencing and then doing [0]_". Please read the second half of my answer once again. – H.S. Dec 22 '21 at 08:53
  • But you wrote (*(const char (*)[2])(void *)&"a")[0] works! There are two dereferencing operations (* and [0]) going on here! Also, isn't C purely imperative? there is no such thing as an object in C so "constructs an unnamed object of character array type" is just very confusing for me. – jho317 Dec 23 '21 at 00:03
  • @jho317 I have edit my post and answered your queries. Please check. – H.S. Dec 23 '21 at 03:21