If goroutines involve userspace threads, can a blocking operation leads to context switch of the entire thread?

Question

Apologies if this question is too stupid. I was reading through the details of goroutines Here. According to that page, it says Goroutines are multiplexed onto a small number of OS threads, rather than a 1:1 mapping by which, all I could think of with my limited knowledge was, there are a limited number of OS threads spawned, inside which, it may be using userspace threads or coroutines. Is this correct? And if so, if I may take an example, if a program clones 4 OS threads inside which there are multiple userspace threads, and there happen to be a single blocking operation inside all these 4 threads along with non-blocking operations, will the OS scheduler context-switch all these threads, as the userspace threads are not transparent to the OS threads?

Out of curiosity, is there a possible C implementation of goroutines, which could help understand the internals?

Number of threads is not limited, see related questions: [Number of threads used by Go runtime](http://stackoverflow.com/questions/39245660/number-of-threads-used-by-go-runtime) and [Why does it not create many threads when many goroutines are blocked in writing file in golang?](http://stackoverflow.com/questions/28186361/why-does-it-not-create-many-threads-when-many-goroutines-are-blocked-in-writing) — icza, Sep 25 '16 at 10:18
Remember that blocking operations are very uncommon in most programs. File IO would be the most common, maybe cgo calls next, but in the scheme of things this is very infrequent. — JimB, Sep 25 '16 at 15:43

kkaosninja · Accepted Answer · 2016-09-25T15:13:32.847

Below is what I understood after reading Go in Action

Goroutines run within what are called "logical processors"(NOT physical processors). Each of these logical processors are bound to a single OS thread.

After Go 1.5, the number of logical processors equals the number of available physical processors.

The Go scheduler intelligently schedules the running of multiple goroutines on each of these logical processors

Crude diagram follows :-

OS Thread ------ Logical Processor ------ Goroutine 1, Goroutine 2..... Goroutine n

Now, it is very likely that one of the Goroutines makes a blocking system call. When this happens,

The OS thread and the Goroutine that made the blocking call are detached from the logical processor

This logical processor now has no OS thread.
The Go scheduler creates a new OS thread, and attaches it to the logical processor. The remaining goroutines that were attached to the logical processor, now continue to run.
The detached goroutine and the OS thread it is associated with continue to block, waiting for the syscall to return.
When the system call returns, the goroutine is re-attached to one of the logical processors, and is placed in its run queue.
The OS thread is "put aside for future use". I am guessing it is added to some sort of thread pool.

If the goroutine makes a Network I/O call, it is handled in slightly different way.

The goroutine is detached from the logical processor, and is moved to the integrated network poller. Once the poller says the I/O operation is ready, the goroutine is re-attached to the logical processor to handle it.

-- Now, to answer your question :-)

I'm not an expert, but this is what I think will happen, based on what was stated above.

Since one goroutine on each of the 4 OS threads have made a blocking syscall, all 4 threads will be detached from their logical processors, and will continue to block until the syscall(s) return. The 4 OS threads will be associated with the respective goroutines that made the blocking syscall.

Now, this results in 4 logical processors(and the non-blocking goroutines attached to them) without any OS threads.

So, the GO scheduler creates 4 new OS threads, and assigns the logical processors to these threads.

--

From the OS's point of view, the 4 OS threads that made blocking calls obviously cannot be allowed to take CPU time, since they aren't doing anything.

So it will switch their contexts with some other non-blocking thread of its choosing.

Thank you @kkaosninja. So how can this logical processers different from OS threads? Why is the concept of logical processors required at all, if the underlying system is userspace threads to OS threads mapping? If a program has to create this concept of logical processor from a userspace, how is it done? Is there a possible C implementation which can be used for understanding? — nohup, Sep 26 '16 at 07:09
The best answer here explains logical processors quite nicely => http://www.tomshardware.com/answers/id-1850932/difference-physical-core-logical-core.html. Also, from Go 1.5, the compiler and runtime were completely written in Go. So if you want the C source code, you will have to look at the sources for Go 1.4 and below. — kkaosninja, Sep 26 '16 at 15:13

If goroutines involve userspace threads, can a blocking operation leads to context switch of the entire thread?

1 Answers1

Linked