0

I read the chapter 3 of the textbook "Computer Systems: A Programmer Perspective". Chapter 3 is about introducing instruction set architecture(ISA) of x86-64. In this chapter, there are several examples of assembly code that were compiled from C programs. In these examples, I encountered two questions which are about the address of structure.

First, suppose the starting address of integer array E and integer index i are stored in register %rdx and %rcx, respectively. The following shows an assembly-code implementation of each expression:

Expression Type Value Assembly code
E int * XE movl %rdx, %rax
E[0] int M[XE] movl (%rdx), %eax
&E[2] int * XE+8 leaq 8(%rdx), %rax
&E[i]-E long i movq %rcx, %rax

Why the expression &E[i]-E be the long type instead of the int * type? Why the value of &E[i]-E be i instead of 4i?

Second, suppose a structure:

typedef union {
    struct{
        long u;
        short v;
        char w;
    }t1;
    struct{
        int a[2];
        char *p;
    }t2;
} u_type;

You write a series of functions of the form

void get(u_type *up, type *dest){
    *dest = expr;
}

with different access expression expr and with destination data type type set according to type associated with expr. Suppose in these functions that up and dest are loaded into register %rdi and %rsi, respectively.The following table is about the assembly code of the expression:

Expression Type Assembly code
up->t1.u long movq (%rdi), %rax \n movq %rax, (%rsi)
&up->t1.w char * addq $10, %rdi \n movq %rdi, (%rsi)

Considering expression &up->t1.w, I know that we want to assign the the address of the char w to dest and the offset of the char w is 10 bytes of the beginning of the structure. But why the answer is to store 10 to dest instead of storing 10(%rdi) to dest? If the answer is correct, why the type is char * instead of long?

To make it clear, why the assembly code of the expression &up->t1.w not like this:

leaq 10(%rdi), %rax
movq %rax, (%rsi)
Harry Lu
  • 34
  • 5
  • 2
    For the pointer subtraction, you can refer to [similar questions](https://stackoverflow.com/q/3238482/555045). `addq $10, %rdi` doesn't store 10 in rdi, it adds 10 to rdi. – harold Feb 06 '22 at 05:34
  • 3
    Harry, `&E[i]-E` is subtracting two pointers. That different is an integer, not a pointer. – chux - Reinstate Monica Feb 06 '22 at 06:10
  • About `leaq` vs `addq` there is sometimes more than one "correct" way to do the same thing. See also https://stackoverflow.com/questions/6323027/lea-or-add-instruction – nielsen Feb 06 '22 at 08:02

1 Answers1

0

It makes little or no sense at all to analyze not optimized code. If you enable the optimizations all your problems disappear magically.

struct s1{
    int a[2];
    char *p;
};

typedef union {
    struct{
        long u;
        short v;
        char w;
    };
    struct s1 s;
} u_type;

void get(u_type *up, char *dest)
{
    *dest = up -> w;
}

        movzbl  10(%rdi), %eax
        movb    %al, (%rsi)
        ret

void get(u_type *up, long *dest)
{
    *dest = up -> u;
}

        movq    (%rdi), %rax
        movq    %rax, (%rsi)
        ret

void get(u_type *up, struct s1 *dest)
{
    *dest = up -> s;
}

        movdqu  (%rdi), %xmm0
        movups  %xmm0, (%rsi)
        ret

void get(u_type *up, int *dest)
{
    *dest = up -> s.a[1];
}

        movl    4(%rdi), %eax
        movl    %eax, (%rsi)
        ret

https://godbolt.org/z/xPPvTbPTq

0___________
  • 60,014
  • 4
  • 34
  • 74