120

I have a query related to the implementation of threads in Linux.

Linux does not have an explicit thread support. In userspace, we might use an thread library (like NPTL) for creating threads. Now if we use NPTL it supports 1:1 mapping.

The kernel will use the clone() function to implement threads.

Suppose I have created 4 threads. Then it would mean that:

  • There will be 4 task_struct.
  • Inside the task_struct, there will be provision of sharing resources as per the arguments to clone (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND).

Now I have the following query:

  1. Will the 4 threads have the same PID? If someone can elaborate, how the PIDs are shared.
  2. How are the different threads identified; is there some TID (thread ID) concept?
Mateusz Piotrowski
  • 8,029
  • 10
  • 53
  • 79
SPSN
  • 1,411
  • 2
  • 13
  • 14

3 Answers3

337

The four threads will have the same PID but only when viewed from above. What you (as a user) calls a PID is not what the kernel (looking from below) calls a PID.

In the kernel, each thread has its own ID, called a PID, although it would possibly make more sense to call this a TID, or thread ID, and they also have a TGID (thread group ID) which is the PID of the first thread that was created when the process was created.

When a new process is created, it appears as a thread where both the PID and TGID are the same (currently unused) number.

When a thread starts another thread, that new thread gets its own PID (so the scheduler can schedule it independently) but it inherits the TGID from the original thread.

That way, the kernel can happily schedule threads independent of what process they belong to, while processes (thread group IDs) are reported to you.

The following hierarchy of threads may help(a):

                         USER VIEW
                         vvvv vvvv
              |          
<-- PID 43 -->|<----------------- PID 42 ----------------->
              |                           |
              |      +---------+          |
              |      | process |          |
              |     _| pid=42  |_         |
         __(fork) _/ | tgid=42 | \_ (new thread) _
        /     |      +---------+          |       \
+---------+   |                           |    +---------+
| process |   |                           |    | process |
| pid=43  |   |                           |    | pid=44  |
| tgid=43 |   |                           |    | tgid=42 |
+---------+   |                           |    +---------+
              |                           |
<-- PID 43 -->|<--------- PID 42 -------->|<--- PID 44 --->
              |                           |
                        ^^^^^^ ^^^^
                        KERNEL VIEW

You can see that starting a new process (on the left) gives you a new PID and a new TGID (both set to the same value). Starting a new thread (on the right) gives you a new PID while maintaining the same TGID as the thread that started it.


(a) Tremble in awe at my impressive graphical skills :-)

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • 31
    FYI, `getpid()` returns tgid: `asmlinkage long sys_getpid(void) { return current->tgid;}`, as shown in [www.makelinux.com/](http://www.makelinux.com/books/lkd2/ch05lev1sec2) – Duke Jan 15 '14 at 01:13
  • 8
    @Duke - wow, so that's why I couldn't find a `gettgid(2)` function. And the `getpid()` won't return the TID (thread's "PID"), and there's where `gettid(2)` comes in. This way I can tell, if we're in the main thread or not. – Tomasz Gandor Nov 21 '14 at 12:40
  • 2
    This leads to another interesting point: So if threads and processes are handled equally within the kernel (apart from the tgid), a multi-threaded process will in conclusion get more CPU time than a single-threaded one, provided that both have the same priority and none of the threads is halted for any reason (such as waiting for a mutex). – Aconcagua Sep 21 '15 at 10:47
  • 1
    @Aconcagua, CFS (the completely fair scheduler in Linux) generally works that way but also allows the use of group scheduler extensions to make the fairness operate across certain groups of tasks rather than individual tasks. I've never really looked into it other than a cursory glance. – paxdiablo Sep 22 '15 at 04:54
  • ''getpgrp'' to get group id – Pengcheng Dec 15 '15 at 01:41
  • So `htop` is actually showing the thread id when it shows threads and kernel threads with `H` and `K` toggles. – CMCDragonkai Apr 19 '18 at 06:50
  • @paxdiablo Did you use a program or something to create the diagram in the answer? If yes, please mention its name :) – Ebram Shehata Jun 24 '18 at 11:07
  • 1
    It looks like `gettid()` just reutrns back the `pid` field. https://elixir.bootlin.com/linux/latest/source/kernel/sys.c#L897 – gipouf Apr 16 '19 at 22:07
  • Is there a book that introduces all the concepts and implementations of a Linux operating system? – Ziqi Fan Nov 09 '21 at 14:29
4

Threads are identified using PIDs and TGID (Thread group id). They also know which thread is a parent of who so essentially a process shares its PID with any threads it starts. Thread ID's are usually managed by the thread library itself (such as pthread, etc...). If the 4 threads are started they should have the same PID. The kernel itself will handle thread scheduling and such but the library is the one that is going to be managing the threads (whether they can run or not depending on your use of thread join and wait methods).

Note: This is from my recollection of kernel 2.6.36. My work in current kernel versions is in the I/O layer so I don't know if that has changed since then.

Jesus Ramos
  • 22,940
  • 10
  • 58
  • 88
  • 1
    Here's an explanation for Linux 2.4 that you may find useful https://unix.stackexchange.com/a/364663/387462 – sindhu_sp Oct 23 '20 at 09:07
-8

Linux provide the fork() system call with the traditional functionality of duplicating a process. Linux also provides the ability to create threads using the clone() system call However , linux does not distinguish between processes and thread.

Robert
  • 5,278
  • 43
  • 65
  • 115