Parallel class and thread context switch

Question

Trying to understand how the thread context switch affects the execution of iterations of the Parallel class through ForEach or For usage. I tried to increase CPU usage up to 100% with execution of several processes, but not a single Parallel's iteration has changed it's Thread.CurrentThread.ManagedThreadId value.

To increase CPU up to 100% usage, I've started several high priority processes, including the example.

Code where do we need to handle thread context switch:

Parallel.For(0, 3, (index, i) =>
{
    var firstId = Thread.CurrentThread.ManagedThreadId;
    while (true)
    {
        var rnd = new Random(index);
        checkIds(firstId, Thread.CurrentThread.ManagedThreadId);
        var digits = new List<int>();
        for (int j = 0; j < 10000; j++)
        {
            digits.Add(rnd.Next());
            if (continueWriteLine)
                Console.WriteLine($"ID: = {Thread.CurrentThread.ManagedThreadId}");
        }

        if (continueWriteLine)
            digits.ForEach(Console.WriteLine);
    }
});

Code that tries to handle thread switch:

if (firstId != currentId)
{
    continueWriteLine = false;
    Thread.Sleep(1000);
    Console.WriteLine($"{firstId} - {currentId}");
}

So, I have several questions:

Can a thread switch to another one, during the execution of an iteration of the Parallel class, by some reason e.g. Thread.Sleep, lock statement, Mutex, etc.?
And how this threads' switch affects the ManagedThreadId property, if they really will be switched?
Will it be safe to use ManagedThreadId as unique key of the ConcurrentDictionary from which any information can be retrieved for a current operation e.g. information about file's reading: current line, desired object to read, already read objects, and a lot of other things that are needed during current operation?

P.S. The reason for the solution given in the third question is lack of desire to transfer most of this data between methods that helps me read and process every new line of file in order to maintain context of file's processing. Maybe the solution would be to transfer only one object between parser's methods, something like FileProcessingInfo, that contains all context data (which I mentioned in the third question), but I don't know for sure which solution would be better.

I'm not sure how you observed that "not a single Parallel's iteration has changed it's Thread.CurrentThread.ManagedThreadId value" Do you have a short code snippet that demonstrates the behaviour you are observing? Your question #1 is not grammatically correct. I'm not sure you understand what a thread context switch even *is*. Threads don't switch IDs. You get that much, right? — Wyck, Jun 30 '23 at 03:05
*"...but not a single Parallel's iteration has changed it's `Thread.CurrentThread.ManagedThreadId` value."* -- This is highly unexpected. Could you include in the question a minimal demo that reproduces this behavior? — Theodor Zoulias, Jun 30 '23 at 03:06
@Wyck thanks for noticing the incorrect question, I have reformulated it, hope it will be more understandable now. — KKomrade, Jun 30 '23 at 04:09
@TheodorZoulias I've added a code example before the first question, please let me know if there is something wrong with this example — KKomrade, Jun 30 '23 at 06:27
Ah, I think that *now* I understand what you mean. You don't mean that all 3 iterations run on the same thread. You mean that each individual iteration runs on the same thread from start to finish. That's the behavior that you want to be clarified, correct? — Theodor Zoulias, Jun 30 '23 at 06:48

score 1 · Accepted Answer · answered Jun 30 '23 at 07:26

Can a thread switch to another one, during the execution of an iteration of the Parallel class, by some reason e.g. Thread.Sleep, lock statement, Mutex, etc.?

No. Each individual iteration of a Parallel.For/Parallel.ForEach loop runs invariably on the same thread from start to finish. This thread is completely dedicated to this iteration, and won't do any unrelated work elsewhere before this iteration completes. After this iteration completes, the thread might dedicate itself to some other iteration, or return to the ThreadPool.

Each iteration is not guaranteed to run non-stop on one physical CPU-core though. The operating system might perform one or many thread switches, by suspending the execution of this thread and assigning the physical CPU-core to some other thread. This phenomenon is transparent to your program. The thread itself doesn't experience any observable symptom whenever a thread-switch occurs at the operating system level. I don't know if the .NET platform itself receives any notification from the operating system whenever a thread-switch occurs. If I had to guess, I would say probably not.

It should be noted that the new asynchronous API Parallel.ForEachAsync (.NET 6) invokes an asynchronous body delegate, and asynchronous delegates that contain await statements are routinely switching threads after each await. In this case it's not an operating system thread-switching. It's the kind of thread-switching that you are interested for, with the Thread.CurrentThread.ManagedThreadId changing after the await.

Will it be safe to use ManagedThreadId as unique key of the ConcurrentDictionary from which any information can be retrieved for a current operation e.g. information about file's reading: current line, desired object to read, already read objects, and a lot of other things that are needed during current operation?

No, because this ID is not guaranteed to be unique. After a thread terminates, its number can be reused. And you have no control over the life-cycle of the ThreadPool threads, the threads that the Parallel APIs use by default. If you want to identify uniquely a Thread, use the Thread object itself as the TKey of the dictionary (ConcurrentDictionary<Thread, FileProcessingInfo>). But you might find using instead a ThreadLocal<FileProcessingInfo> to be more convenient.

score 1 · Answer 2 · answered Jun 30 '23 at 14:55

A parallel for loop makes use of a partitioner. If one is not provided, it uses the built-in default general-purpose one. The job of the partitioner is to divide up the iteration range into chunks.

The smallest unit of granularity with which a parallel for loop can be partitioned is one iteration. In other words, a worker thread will call the parallel for loop body function with one as-of-yet-unassigned loop index in the range. When that worker thread completes the loop body for that index, it will consult the partitioner again to see which index it should execute next.

The default partitioner can work hand-in-hand with the thread pool and the scheduler and a complex set of heuristics to decide which range to assign which thread. In this way, ranges are assigned in batches to threads dynamically in response to current system state (as opposed to a static partitioning where we just divide the range into a fixed number of chunks of approximately equal size) in an attempt to achieve good performance in typical general-purpose scenarios.

Here are some basic principles of how any implementation will approach partitioning and thread pool management:

The goal is likely to achieve high occupancy. Meaning each virtual processor in the system should be executing a runnable thread. This means we will likely want at least as many threads as there are virtual processors (this may not be so if the loop body executes very quickly, though, and the range is very small, where the overhead of assigning tasks to a thread exceeds the benefit of just executing on the current thread.)
As threads in the thread pool enter a blocked state (i.e. blocked waiting for IO or a synchronization an event, or sleeping), then new threads may be created by the thread pool if occupancy is not at 100%. Thread stacks consume valuable memory resources though, so the heuristics may choose to spend some time waiting for blocked threads to become unblocked rather than always creating new threads otherwise you would be at risk of creating a thread for every iteration of the loop. In general, it can get help from the operating system to know the reason for the thread being blocked, and if that thread is going to be unblocked by something predictably inevitable like a sleep or if it's blocked on something unpredictable like a synchronization event from another thread. Sometimes creating a new thread is the only way to avoid a deadlock. (Consider a loop body that waits for the i + 1st iteration to complete).
If all the thread pool threads are runnable, then no new threads are created (this is high-occupancy achieved).
It conservatively measures performance of how long it takes to execute each loop body iteration and partitions some of the remainder of the range accordingly, to take advantage of the reduced overhead benefit of assigning a range of iterations to a particular thread, where it can execute a batch of iterations without incurring the overhead of checking back with the partitioner to see which range it is supposed to execute next.
The default partitioner is by no means optimal, and cannot fundamentally be optimized because it must be designed without knowing the pattern of loop body behaviour. So it is a general purpose partitioner that does pretty good for most typical use cases.

Parallel class and thread context switch

2 Answers2