0

I just saw following code snippet:

public List<string> ProcessText(List<student> students)
{
    List<student> result = new();

    Parallel.ForEach(students, (st) => 
    {
        if (string.IsNullOrEmpty(st.Name))
        {
            throw new Exception("Name cannot be empty");
        }

        result.Add(st);
    });

    return result;
}

It got me thinking since the Parallel.ForEach will distribute the given method arguments i.e.students, between different threads, it doesn't need any synchronization when it wanted add object into the list, for example using lock or concurrent collection instead of List<T>?

Now consider we change the code like this:

public List<string> ProcessText(List<student> students)
{
    List<student> result = new();

    Parallel.ForEach(students, (st) => 
    {
        if (result.Any(x=>x.Name == st.Name))
        {
            // do some thing and ignore the rest of the code and go for the next student
        }

        if (string.IsNullOrEmpty(st.Name))
        {
            throw new Exception("Name cannot be empty");
        }

        result.Add(st);
    });

    return result;
}

Now it refers to the List<student> and read some values inside of it, so it needs synchronization? And why?

Thanks in advance

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
pmn
  • 2,176
  • 6
  • 29
  • 56
  • 2
    First one needs synchronisation too. Kind of duplicate question: https://stackoverflow.com/questions/5589315/list-add-thread-safety – Renat Aug 14 '23 at 14:29
  • In a multiuser operating system you cannot control the order that tasks are run. So there is no guarantee the processes are going to run 1,2,3,1,2,3,1,2,3. They could run 1,2,1,3,2,1,3,2.3. – jdweng Aug 14 '23 at 14:32
  • @jdweng thanks, but here order is not the main concern – pmn Aug 14 '23 at 15:26
  • It you are sending data through multiple streams it does matter. The data may not be split between each student record. It may be split in the middle of the students records. Then you have to combine in the same or that it was split. – jdweng Aug 14 '23 at 15:45
  • 1
    Both needs synchronisation because multiple threads might be modifying the list (through `Add`) at the same time, and `List` is not thread safe. That's all. You cannot concurrently modify something that wasn't designed for that. – Etienne de Martel Aug 15 '23 at 02:04

1 Answers1

2

It got me thinking since the Parallel.Foreach will distribute the given method arguman i.e.students, between different threads,it doesn't need any synchronization when it wanted add object into the list

This is incorrect. List is not thread safe. So synchronization will be needed when adding items. In general, multiple threads reading the same memory is safe. But as soon as you have concurrent reading and writing, or just concurrent writing, you need some form of synchronization. result.Add certainly does writing (to shared memory), so synchronization is required. The result of this code could be that students goes missing, or are added twice, or an exception, or something else, just don't do it.

Your second example is even more unsafe since you are doing both concurrent reading and writing.

Whenever you are using multiple threads you need to be really careful either:

  1. use objects local to the thread
  2. ensure memory is only being read
  3. Use thread safe objects
  4. Use synchronization/locks to protect thread unsafe objects

In the end I would not recommend using any type of concurrent code for such a simple example. I would also recommend reading and learning a bit more about thread safety before using any type of multi threaded code for anything important. Multi threading bugs are notorious for being difficult to find and debug.

JonasH
  • 28,608
  • 2
  • 10
  • 23
  • about the first code snippet, i think its ok, since `Parallel.Foreach` use `fork/join` approach , it deveided the given list into seperated small list and then it starts to process it, in addition in the first code the repetition is not a concern as u can see in the code it didn't check it at all, about the second one i think it needs `synchronization` – pmn Aug 14 '23 at 15:24
  • 1
    @pejman both code snippets violate Microsoft's [documentation](https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1#thread-safety) about the thread-safety of the `List`, so the behavior of both snippets is undefined. In case your program produces incorrect results, don't file a bug report to Microsoft, because they won't acknowledge it as a bug. Microsoft accepts no responsibility when their products are used incorrectly. – Theodor Zoulias Aug 14 '23 at 21:54
  • 2
    @pejman The problem is not using a list as input to the parallel.Foreach, that is perfectly fine. The problem is calling `result.Add`, that method is *not threadsafe*! Whenever you are using a shared object from multiple threads you must ensure it is safe to use from multiple threads. And unless specifically documented otherwise, objects should be assumed to be non-thread safe. – JonasH Aug 15 '23 at 06:30