1

I have a program that has a main thread and a second thread. The second thread modifies a global variable which then will be used in the main thread. But somehow the changes I make in the second thread are not shown in the main thread.

section .bss USE32
  global var
  var resd 1

section .text USE32
  ..start:
  push 0
  push 0
  push 0
  push .second
  push 0
  push 0
  call [CreateThread]
  mov eax, 1
  cmp [var], eax ; --> the content of var and '1' are not the same. Which is confusing since I set the content of var to '1' in the second thread
  ;the other code here is not important

.second:
  mov eax, 1
  mov [var], eax
  ret

(This is a simplification of my real program which creates threads in a loop; I haven't tested this exact code.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
MenNotAtWork
  • 145
  • 2
  • 8

1 Answers1

3

You don't join the new thread (wait for it to exit); there's no reason to assume that it's finished (or even fully started) when CreateThread returns to the main thread.

You could spin-wait until you see a non-zero value in [var], and count how many iterations that takes, if you want to benchmark thread-startup overhead + inter-core latency.

   ...
   call  [CreateThread]
   mov   edi, 1
   cmp   [var], edi
   je   .zero_latency    ; if var already changed

   rdtsc                 ; could put an lfence before and/or after this to serialize execution
   mov  ecx, eax         ; save low half of EDX:EAX cycle count; should be short enough that the interval fits in 32 bits
   xor  esi, esi
  .spin:
   inc  esi            ; ++spin_count
   pause               ; optional, but avoids memory-order mis-speculation when var changes
   cmp  [var], edi
   jne .spin

   rdtsc
   sub  eax, ecx        ; reference cycles since CreateThread returned
   ...
 .zero_latency:         ; jump here if the value already changed before the first iteration

Note that rdtsc measures in reference cycles, not core clock cycles, so turbo matters. Only doing the low 32 bits of the 64-bit subtraction is fine if the interval is less than 2^32 (e.g. about 1 second on a CPU with a reference frequency of 4.2 GHz, vastly longer than we'd expect here).

esi is the spin count. With pause in the loop, you'll do about one check per 100 cycles on Skylake and later, or about one check per 5 cycles on earlier Intel. Otherwise about one check per core clock cycle.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I added a loop to the main thread. But the second thread is still not able to modify the global variable `var` so that the new value is visible to the main thread. Is there some keyword I have to use to tell the compiler that this variable can be modified by multiple threads? For example `volatile` in C. – MenNotAtWork Mar 28 '21 at 18:03
  • 1
    Volatile in C just means the load or store has to actually happen in asm. When hand-writing in assembly language, you're already forcing that to happen. Cache is coherent so stores in one thread definitely become visible to loads in other threads within a few hundreds of nanoseconds. No special instruction is needed. That's [*why* `volatile` works in C](https://stackoverflow.com/questions/4557979/when-to-use-volatile-with-multi-threading/58535118#58535118). – Peter Cordes Mar 28 '21 at 18:08
  • 1
    @MenNotAtWork: If your spin-loop in main isn't exiting, then you have a bug in your program, e.g. perhaps CreateThread is returning an error without actually creating another thread. Try putting some other code in the other thread, like a call to `puts`, or a `CreateFile` that will have some visible effect. Or just set a breakpoint in `.second` and see if it's ever reached. – Peter Cordes Mar 28 '21 at 18:10
  • 1
    @MenNotAtWork: Note that `push second` is not the same label as `.second`, the `.` is important. If you have some other `second:` label somewhere, that's what you're passing to CreateThread. I assumed it was just a typo in the question, but if that's your real code then that's a problem. – Peter Cordes Mar 28 '21 at 18:13
  • 1
    The call to CreateProcess is in a loop in my real program. I simplified the code for the question. I guess by calling multiple threads in my original program I'm overwriting the content of `var` again right after it got changed in the second thread. – MenNotAtWork Mar 28 '21 at 18:23