Why NPTL threading in Linux still assignee unique PID to each thread?

Question

I am reading pthread man and seeing following:

With NPTL, all of the threads in a process are placed in the same thread group; all members of a thread group share the same PID.

My current architecture is running on NPTL 2.17 and when I run htop that is showing threads I see that all PIDs are unique. But why? I am expecting some of them (e.g. chrome) sharing same PID with each other?

Maxim Egorushkin · Answer 1 · 2019-09-04T15:29:46.550

See man gettid:

gettid() returns the caller's thread ID (TID). In a single-threaded process, the thread ID is equal to the process ID (PID, as returned by getpid(2)). In a multithreaded process, all threads have the same PID, but each one has a unique TID. For further details, see the discussion of CLONE_THREAD in clone(2).

What htop shows is TID, not PID. You can toggle display of the threads on/off with H key.

You can also enable PPID column in htop and that shows the PID / TID of the main thread for threads.

score 1 · Answer 2 · answered Sep 04 '19 at 15:23

1

Google's documentation for Chromium (which probably operates similarly to Chrome when it comes to these concepts) states that they use a "multi-process architecture". Your quote from pthread's man page states that all of the threads in a single process are placed under the same PID, which would not apply to Chrome's architecture.

answered Sep 04 '19 at 15:23

Kevin K.

1,327
2
13
18

You are not wrong. However, those green `google/chrome` records in `htop` screenshot are not processes but threads. – Maxim Egorushkin Sep 04 '19 at 15:27
@MaximEgorushkin In the currently-posted image, a cursory examination seems to show at least 22 different Chrome processes, based on the variations in `VIRT`, `RES`, and `SHR` memory values. There are only two groups that can possibly be multiple threads in the same process: the first two listed can possibly be threads in the same process, and the three with a short command line consisting only of `/opt/google/chrome/chrome` might also be only one process. Every other instance of Chrome appears be running in different-sized address spaces, making them separate processes. – Andrew Henle Sep 04 '19 at 15:50

Petr Skocik · Answer 3 · 2019-09-04T16:06:34.093

The Linux kernel does have the concept of POSIX pids (explorable in /proc/*) but it calls them thread group ids in the kernel source and it refers to its internal thread ids as pids (explorable in /proc/*/task/*).

I believe this is rooted in Linux's original treatment of threads as "just processes" that happen to share address spaces and a bunch of other stuff with each other.

Your user tool is likely propagating this perhaps confusing Linux kernel terminology.

score 1 · Answer 4 · answered Sep 04 '19 at 16:06

Because kernel-level threads are no more than processes with the (nearly) same address space.

This was "solved" by the linux kernel development by renaming them the processes to "threads", the "pid"-s to "tid"-s, and the old processes became "thread groups".

However, the sad truth is that if you create the thread on Linux (clone()), it will create a process - only using the (nearly) same memory segments.

That means 1:1 thread model. It means that all the threads are actually kernel-level threads, meaning that they are essentially processes in the same address space.

Some other alternatives would be:

1:M thread model. It means that the kernel doesn't know about threads, it is the task of the user-space libraries to make an "in-process multitasking" to run appearantly multi-threaded.
N:M thread model. This is best, unfortunately some opinion favorize still 1:1. It would mean that we have both user- and kernel-level threads and some optimization algorithm decides, what to run and where.

Once Linux had an N:M model (ngpt), but it was removed on a yet another fallback. It was that Linux kernel calls are inherently synchronous (blocking). Resulting that some kernel-cooperation had been needed even for user-space synchronization. Nobody wanted to do that.

So is it.

P.s. to create a well-performant app, you should actually avoid to create a lot of threads at once. You need to use a thread pool with well-thought locking protocols. If you don't minimize the usage of the thread creations/joins, your app will be slow and ineffective, it doesn't matter if it is N:M or not.

Why NPTL threading in Linux still assignee unique PID to each thread?

4 Answers4