0

Consider one or both of the following 'data' values:

#include <stdio.h>
#include <stdbool.h>
#include <stdint.h>
#include <string.h>
//char * data = "21st Century Schizoid Man\0I Talk to the Wind\0Epitaff\0Moonchild\0The Court of the Lavender King";
char data1[93] = {'2','1','s','t',' ','C','e','n','t','u','r','y',' ','S','c','h','i','z','o','i','d',' ','M','a','n','\0','I',' ','T','a','l','k',
't','o',' ','t','h','e',' ','W','i','n','d','\0','E','p','i','t','a','p','h','\0','M','o','o','n','c','h','i','l','d','\0','T','h','e',' ','C',
'o','u','r','t',' ','o','f',' ','t','h','e',' ','C','r','i','m','s','o','n',' ','K','i','n','g','\0'};
char * data = data1;

void main () {

char cVar1;
int iVar3;
int iVar4;

iVar3 = 0; iVar4 = 0;
//cVar1 = *(char *)((long)iVar3 + *(long *)(data + (long)iVar4 * 8));
cVar1 = *(char*)(*(long*)data);
} 

Why does the assignment to cVar1 cause a segmentation fault? I know the *(long *) is at issue as removing it prevents the Seg Fault.

EDIT: A commenter linked a post which suggested it is an alignment issue,however how would one fix this? changing the size of the char array to 96 so that it's size would be divisible by 4 or 8 (the sizes of a long) did not stop the segmentation fault. Many other casting combinations do work:

cVar1 = (char*)(*(long*)data):
cVar1 = *(char*)(long*)data);

for example

Cheetaiean
  • 901
  • 1
  • 12
  • 26
  • Does this answer your question? [Should I worry about the alignment during pointer casting?](https://stackoverflow.com/questions/13881487/should-i-worry-about-the-alignment-during-pointer-casting) – Raymond Chen Apr 06 '22 at 02:51
  • `"The Court of the Lavender King"` is not `0`-terminated. PS next time please declare your pointer like this: `char* p_data1 = data1;`, it's easier to read. – paladin Apr 06 '22 at 03:01
  • `'0'` should be `0`. – ikegami Apr 06 '22 at 03:05
  • @RaymondChen It is a similar topic but does not make it clear. Ideally I would like to fix any alignment issues so the cast occurs without a seg fault. – Cheetaiean Apr 06 '22 at 03:13
  • @ikegami fixed it but does not make a difference – Cheetaiean Apr 06 '22 at 03:15
  • Why do you think it shouldn't give a segfault?! The second dereference (the leftmost one) is accessing some "random" address! – ikegami Apr 06 '22 at 03:38
  • The second dereference will still point to the original value of data, but clearly the alignment is off. i.e. 1) (long \*) data -> char pointer converted to long pointer 2) \*(long \*) data -> pointer dereferenced to reveal original value (data[0]) 3) (char \*)(\*(long\*)data) -> pointer pointing to data[0] 4) \*(char\*)(\*(long\*)data) -> this pointer derefernced to reveal data[0] – Cheetaiean Apr 06 '22 at 03:41
  • `int main(void)`, not `void main()`. – Keith Thompson Apr 06 '22 at 03:44
  • 1
    Why do you want a `long*` pointer pointing to a character string in the first place? What are you trying to accomplish? There are ways to force a `char` array to be aligned (using a `union`), but whatever you're trying to do, there's probably a better way to do it. Character arrays don't have to be strictly aligned, and there's rarely any good reason to force them to be strictly aligned. – Keith Thompson Apr 06 '22 at 03:46
  • "Why does the assignment to cVar1 cause a segmentation fault?" --> `(char*)(*(long*)data)` forms an invalid pointer. – chux - Reinstate Monica Apr 06 '22 at 03:49
  • not only you're reading unaligned memory, you're also violating the [strict aliasing rule](https://stackoverflow.com/q/98650/995714). Both of which invokes UB – phuclv Apr 06 '22 at 03:55
  • @KeithThompson As to why, the commented out cVar1 calculation was taken by reverse engineering an ELF binary using Ghidra. In that example the values of iVar3 and iVar4 could vary, and it was clearly an attempt at pointer arithmetic. – Cheetaiean Apr 06 '22 at 03:55

1 Answers1

4

This is not so much an issue of alignment as it is an invalid pointer value.

First you have this:

(long*)data

The results in a long * which points to the same memory location as data. Then you dereference the pointer:

*(long*)data

At this point, the C standard states this is undefined behavior. On architectures with more strict alignment requirements this could cause a crash, but on an x86 machine with a 64-bit long, you'll probably end up with a value made up of the first 8 bytes of data with the least significant byte first. This results in a value of 0x6e65432074733132, which if you look closely is the ASCII codes of the first 8 characters in reverse order.

Now you have this:

(char*)(*(long*)data)

Which interprets the value 0x6e65432074733132 as a pointer. Then you attempt to dereference it:

*(char*)(*(long*)data)

The value 0x6e65432074733132 is almost certainly not a valid address, so when you attempt to dereference this value you get a segfault.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • Thank you that does explain it (a simple gdb tracker will also bring up 0x6e65432074733132). Which leads me to wonder how cVar1 = *(char *)((long)iVar3 + *(long *)(data + (long)iVar4 * 8)); could have worked, the actual reverse-engineered calculation. – Cheetaiean Apr 06 '22 at 03:58
  • 1
    @Cheetaiean It probably produces a valid address – M.M Apr 06 '22 at 04:02
  • 1
    @Cheetaiean In the original code, `data` was probably the address of an array of pointers to char, so it comes out to `data[iVar4][iVar3]` – Raymond Chen Apr 06 '22 at 04:22