2

I am currently following along with this tutorial, but I'm not a student of that school.

GDB gives me a segmentation fault in thread_start on the line:

movq  %rsp, (%rdi)   # save sp in old thread's tcb

Here's additional info when I backtrace:

#0  thread_start () at thread_start.s:16
#1  0x0000000180219e83 in _cygtls::remove(unsigned int)::__PRETTY_FUNCTION__
    () from /usr/bin/cygwin1.dll
#2  0x00000000ffffcc6b in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Being a newbie, I can't for my life figure out why. Here is my main file:

#define STACK_SIZE 1024*1024

//Thread TCB
struct thread {
    unsigned char * stack_pointer;
    void(*initial_function)(void *);
    void * initial_argument;
};

struct thread * current_thread;
struct thread * inactive_thread;

void thread_switch(struct thread * old_t, struct thread * new_t);
void thread_start(struct thread * old_t, struct thread * new_t);

void yield() {
    //swap threads
    struct thread * temp = current_thread;
    current_thread = inactive_thread;
    inactive_thread = temp;

    thread_switch(inactive_thread, current_thread);
}

void thread_wrap() {
   // call the thread's function
    current_thread->initial_function(current_thread->initial_argument);
    yield();
}

int factorial(int n) {
    return n == 0 ? 1 : n * factorial(n - 1);
}

// calls and print the factorial
void fun_with_threads(void * arg) {
    int n = *(int*)arg;
    printf("%d! = %d\n", n, factorial(n));
}
int main() {
    //allocate memory for threads
    inactive_thread = (struct thread*) malloc(sizeof(struct thread));
    current_thread = (struct thread*) malloc(sizeof(struct thread));

    // argument for factorial
    int *p= (int *) malloc(sizeof(int));
    *p = 5;

    // intialise thread
    current_thread->initial_argument =  p; 
    current_thread->initial_function = fun_with_threads;
    current_thread->stack_pointer = ((unsigned char*) malloc(STACK_SIZE)) + STACK_SIZE; 
    thread_start(inactive_thread, current_thread);
    return 0;
}

Here's my asm code for thread_start

# Inline comment
/* Block comment */

# void thread_switch(struct thread * old_t, struct thread * new_t);

.globl thread_start

thread_start:
  pushq %rbx           # callee-save
  pushq %rbp           # callee-save
  pushq %r12           # callee-save
  pushq %r13           # callee-save
  pushq %r14           # callee-save
  pushq %r15           # callee-save

  movq  %rsp, (%rdi)   # save sp in old thread's tcb
  movq (%rsi), %rsp    # load sp from  new thread

  jmp thread_wrap

and thread_switch:

# Inline comment
/* Block comment */

# void thread_switch(struct thread * old_t, struct thread * new_t);

.globl thread_switch

thread_switch:
  pushq %rbx           # callee-save
  pushq %rbp           # callee-save
  pushq %r12           # callee-save
  pushq %r13           # callee-save
  pushq %r14           # callee-save
  pushq %r15           # callee-save
  movq  %rsp, (%rdi)   # save sp in old thread's tcb
  movq (%rsi), %rsp    # load sp from  new thread
  popq  %r15           # callee-restore
  popq  %r14           # callee-restore
  popq  %r13           # callee-restore
  popq  %r12           # callee-restore
  popq  %rbp           # callee-restore
  popq  %rbx           # callee-restore
  ret                  # return
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    What is the value in the register `rdi` at the time you execute that instruction? The parentheses mean that you are *dereferencing* the pointer that it contains, so if the pointer is not valid, your code will segmentation fault. – Cody Gray - on strike Nov 29 '16 at 22:52
  • @CodyGray the value should be the first argument to thread_start which is inactive_thread, a pointer to a struct thread? – user2214143 Nov 29 '16 at 22:58
  • Your code works for me. Try including and ? – SeriousBusiness Nov 30 '16 at 02:56
  • @user2214143: So check with a debugger that it holds what you expect it to hold. See the bottom of the [x86 tag wiki](http://stackoverflow.com/tags/x86/info) for tips on using gdb for asm, or use whatever debugger you prefer. – Peter Cordes Nov 30 '16 at 04:50

1 Answers1

4

You're on cygwin, right? It uses the Windows x64 calling convention by default, not the System V x86-64 psABI. So your args aren't in %rdi and %rsi.

The calling convention is Windows x64, but the ABI is slightly different: long is 64 bit, so it's LP64 not LLP64. See the cygwin docs.

You could override the default with __attribute__((sysv_abi)) on the prototype, but that only works for compilers that understand GNU C.


Agner Fog's calling convention guide has some suggestions on how to write source code that assembles to working functions on Windows vs. non-Windows. The most straightforward thing is to use an #ifdef to choose different function prologues.


This Intel intro to x64 assembly is somewhat Windows-centric, and details the Windows x64 __fastcall calling convention.

(It's followed by examples and stuff. It's a pretty big and good tutorial that starts from very basic stuff, including how to use tools like an assembler. I'd recommend it for learning x86-64 asm in a Windows dev environment, and maybe in general.)

Windows x64 __fastcall (like x64 __vectorcall but doesn't pass vectors in vector regs)

  • RCX, RDX, R8, R9 are used for integer and pointer arguments in that order left to right
  • XMM0, 1, 2, and 3 are used for floating point arguments.
  • Additional arguments are pushed on the stack left to right.
  • Parameters less than 64 bits long are not zero extended; the high bits contain garbage.
  • It is the caller's responsibility to allocate 32 bytes of "shadow space" (for storing RCX, RDX, R8, and R9 if needed) before calling the function.
  • It is the caller's responsibility to clean the stack after the call.
  • Integer return values (similar to x86) are returned in RAX if 64 bits or less.
  • Floating point return values are returned in XMM0.
  • Larger return values (structs) have space allocated on the stack by the caller, and RCX then contains a pointer to the return space when the callee is called. Register usage for integer parameters is then pushed one to the right. RAX returns this address to the caller.
  • The stack is 16-byte aligned. The "call" instruction pushes an 8-byte return value, so the all non-leaf functions must adjust the stack by a value of the form 16n+8 when allocating stack space.
  • Registers RAX, RCX, RDX, R8, R9, R10, and R11 are considered volatile and must be considered destroyed on function calls. RBX, RBP, RDI, RSI, R12, R14, R14, and R15 must be saved in any function using them.
  • Note there is no calling convention for the floating point (and thus MMX) registers.
  • Further details (varargs, exception handling, stack unwinding) are at Microsoft's site.

Links to MS's calling-convention docs in the tag wiki (along with System V ABI docs, and tons of other good stuff).

See also Why does Windows64 use a different calling convention from all other OSes on x86-64?

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    For reference the calling convention is specified in the [Cygwin docs](https://cygwin.com/cygwin-ug-net/programming.html) . It does in fact use the Microsoft x64 Calling convention. Cygwin differs from MinGW and Windows in that uses an LP64 data model instead of LLP64. – Michael Petch Nov 30 '16 at 06:13
  • Ah, good catch on Cygwin! I saw that, but didn't think anything of it, because I figured it used the System V ABI, rather than the Windows one. Seems like a rather strange choice, considering they go out of their way otherwise to emulate the *nix behavior. I disagree with that Intel doc's choice to refer to the Windows x64 calling convention as "fastcall" through. Fastcall exists for 32-bit code, but is rather different in a number of significant ways than the x64 convention. They really only share the superficial design choice of passing some args in regs, so reusing the name is confusing. – Cody Gray - on strike Nov 30 '16 at 07:52
  • @CodyGray: Cygwin uses the Windows ABI so it can call Windows libraries, and make libraries that can be called by Windows code. Makes sense to me; it's POSIX source-compatibility layer, not a Linux binary-compat layer, and doesn't support `int 0x80` or `syscall` either. (i.e. it's not WINE). – Peter Cordes Nov 30 '16 at 08:17
  • @CodyGray: It's weird, but [Microsoft calls it `__fastcall`](https://msdn.microsoft.com/en-us/library/ms235286.aspx) for x64. Although that page links to the 32-bit `__fastcall` doc, which says [*The `__fastcall` keyword is accepted and ignored by the compilers that target ARM and x64 architectures; on an x64 chip, by convention, the first four arguments* ...](https://msdn.microsoft.com/en-us/library/6xa169sk.aspx). They definitely use `__vectorcall` for both the 32-bit version and 64-bit version of that, which have huge differences. – Peter Cordes Nov 30 '16 at 08:23
  • 1
    Yeah, I knew vectorcall was overloaded in meaning, but I hadn't seen any Microsoft documentation that used fastcall to refer to the x64 calling convention. That's rather too bad. Oh well, I guess it's of little surprise. Poor fastcall was always so overloaded in meaning as to be effectively meaningless for 32-bit builds. It was never standardized, and each vendor (MS, Borland, Watcom, etc.) implemented their own take on it. Microsoft's compiler ignores all calling convention specifiers but `__vectorcall` for x64 builds. – Cody Gray - on strike Nov 30 '16 at 08:56
  • @CodyGray: heh, `__fastcall` was too tempting a name, I guess. Or were they trying but failing to be compatible-enough to interoperate? – Peter Cordes Nov 30 '16 at 09:20
  • (off topic): Outside of Windows, AFAIK there's still really only the 32-bit x86 System V ABI. It's not good for performance (no register args), but at least it's pretty compatible! (except for the somewhat-recent addition of the 16B stack alignment requirement to the ABI, where it was originally 4B). Unlike Microsoft cdecl or stdcall, x86-32 SysV won't return even small structs in edx:eax, and there's no version of it that uses XMM registers for passing/returning FP args at all, so FP returns have to be in st0. – Peter Cordes Nov 30 '16 at 09:20
  • (update: GCC does have an `-msseregparm` option to return FP args in XMM in 32-bit mode. It's related to `-mregparm=3` which is also not ABI-compatible.) – Peter Cordes Jan 14 '21 at 10:00