Implementing user level threads library Starting a new thread [Homework]

Question

I have seen this: Implementing a User-Level Threads Package and it doesn't apply.

During the implementation of Thread_new(int func(void*)), that assigns a thread and creates a stack, I am unable to think of a way to set the program counter (%eip) if I am correct, so when the thread is started by the scheduler, it starts at the given function's (func) entry point.

Although I have seen many c-only (no assembly) implementations, we have been given the following code (x86):

_thrstart:
    pushl  %edi
    call *%esi
    pushl %eax
    call Thread_exit

Is there a specific reason to push %edi to the stack? I can't seem to find another use for esi/edi apart from byte copying.

I realize that the indirect call to *%esi is probably used to call the function from the context of the new thread, but apart from that, I don't seem to understand how (or what) %esi points to being a valid function address when _thrstart is called from Thread_new

NOTES:

Thread_exit is the cleanup thread, implemented in c.

This is HOMEWORK

To the downvoter: If the question is too specific, badly worded or has a terribly clear answer, constructive criticism would be nice :) — GCon, Oct 19 '14 at 00:20
Stackoverflow is not here to do your homework, the point of homework is to help you learn. Not to be offloaded on others. — Mgetz, Oct 19 '14 at 00:22
And I completely agree. I have been researching the subject for 2 days, implemented most of the assignment. I am unable to understand a specific part of the implementation, namely the purpose of the given function. I have tried to understand it, but couldn't find any documentation (Not pertaining to the function - but to the instructions used. — GCon, Oct 19 '14 at 00:23
`eip` is set when the thread bootstrapping code calls the function pointer you pass to `Thread_new`, basically the `call` instruction can be viewed as a `move eip ` — Mgetz, Oct 19 '14 at 00:24
Thank you for your reply. Yes, that was what I had in mind. I am still unable to understand how the indirect call helps. Any (even tiny) hint on why edi is pushed to the stack of the thread **calling** Thread_new? — GCon, Oct 19 '14 at 00:28
`edi` is going to be the `void*` argument for the thread func. Somebody must have previously set `esi` and `edi` for you. — Jester, Oct 19 '14 at 00:29
@GCon I highly suggest you read up on calling conventions and on the `call` instruction itself, intel has the manuals online it's in volume 2 last I checked — Mgetz, Oct 19 '14 at 00:31
@Jester Thank you! After checking the assembly code, I now see how edi is set. — GCon, Oct 19 '14 at 00:37
@Mgetz Thank you for the reference. I think I have enough to go on with. I will answer the question as soon as I have completed. — GCon, Oct 19 '14 at 00:37
Load a call instruction code and address of 'Thread_Exit', parameter, function address etc. onto the stack of the new thread so as to 'manually' create data at the TOS that has a function call or interrupt frame at the end. Change the stack pointer to point at the new frame. Do a RET or IRET. Thereafter, don't call it - let it call you:) — Martin James, Oct 19 '14 at 06:50

Brendan · Answer 1 · 2014-10-19T00:55:04.703

In general; you can break "scheduler" down into 4 parts.

The first part is the mechanics of switching from one thread to another. This mostly involves storing the previous thread's state somewhere and loading the next thread's state from somewhere. Here, "somewhere" could be some sort of thread control block, or it could be the thread's stack, or both, or something else. A thread's state may include the contents of general purpose registers, it's stack top (esp), it's instruction pointer (eip), and anything else (MMX/SSE/AVX registers). However, for co-operative scheduling a thread's state could be much less (e.g. most of a thread's state is trashed by thread switching and cooperative scheduling is used so that the thread itself knows when its state is going to be trashed and can prepare for that).

The second part is deciding when to do a thread switch and which thread to switch to. This varies widely for different schedulers.

The third part is starting a thread. This mostly involves constructing the data that would be loaded during a thread switch. However, it's possible to do this in a "lazy" way, where you only create the minimal amount of state when first creating a thread, and then finish creating the remainder of the thread's state after it has been given CPU time.

The fourth part is terminating a thread. This involves destroying/freeing the data that would be loaded during a thread switch; but can also mean cleaning up any resources that the thread failed to release (e.g. file handles, network connections, thread local storage, whatever) so that you don't end up with "resource leaks".

Thank you for your reply. I have implemented most of what you recommend. The scheduler is non-preemptive (cooperative) - at least to begin with. I am having trouble with the _third part_; After allocating and setting up a stack for the new thread, how to I get the function's address to the new thread? Particularly: How %esi is used for this purpose. — GCon, Oct 19 '14 at 01:00

score 2 · Answer 2 · answered Oct 19 '14 at 07:00

Typically, in simple RTOessess, threads are not started by being called or jumped to - they are started by being returned or interrupt-returned to.

The trick is to assemble data at the top of the new stack so that is looks as if the thread has been running before and has either called the scheduler or entered it via an interrupt. At the bottom of this 'frame' should be the address of the thread function. You can then load the stack pointer with the address of the frame, enable interrupts and and perform a RET or IRET to start the thread function.

It's convenient to also first shove on a parameter that the new thread can retrieve and a call to the 'TerminateThread' or 'Thread_Exit', so that if the thread function returns, the scheduler can terminate it.

score 0 · Accepted Answer · answered Oct 19 '14 at 15:51

Seems that the problem wasn't as complicated as before.

Based on the answer given by @Martin James, the Stack is prepared so that the return address is the _thrstart function. Based on the assembly used to perform a context switch, the registers edi and esi are stored in specific locations on the stack (when the thread is inactive). By using edi and esi as general purpose registers, edi contains the void* argument, and esi contains the address of the function to be called from the new thread.

_thrstart:
pushl  %edi        #pushes argument for function func to the stack
call *%esi         #indirect call to func
pushl %eax         #Expect return value in eax, push to stack
call Thread_exit   #Call thread cleanup

Implementing user level threads library Starting a new thread [Homework]

3 Answers3