1

I am tasked with writing a system to process result files created by a different process(which I have no control over) and and trying to modify my code to make use of Parallel.Foreach. The code works fine when just calling a foreach but I have some concerns about thread safety when using the parallel version. The base question I need answered here is "Is the way I am doing this going to guarantee thread safety?" or is this going to cause everything to go sideways on me.

I have tried to make sure all calls are to instances and have removed every static anything except the initial static void Main. It is my current understanding that this will do alot towards assuring thread safety.

I have basically the following, edited for brevity

static void Main(string[] args)
{
    MyProcess process = new MyProcess();
    process.DoThings();
}

And then in the actual process to do stuff I have

public class MyProcess
{
    public void DoThings()
    {
        //Get some list of things
        List<Thing> things = getThings();
        Parallel.Foreach(things, item => {

            //based on some criteria, take actions from MyActionClass
            MyActionClass myAct = new MyActionClass(item);
            string tempstring = myAct.DoOneThing();

            if(somecondition)
            {
                MyAct.DoOtherThing();
            }
            ...other similar calls to myAct below here
        };              
    }
}

And over in the MyActionClass I have something like the following:

public class MyActionClass
{
    private Thing _thing;

    public MyActionClass(Thing item)
    {
        _thing = item;
    } 

   public string DoOneThing()
   {
       return _thing.GetSubThings().FirstOrDefault();
   }

   public void DoOtherThing()
   {
       _thing.property1 = "Somenewvalue";
   }

}

If I can explain this any better I'll try, but I think that's the basics of my needs

EDIT: Something else I just noticed. If I change the value of a property of the item I'm working with while inside the Parallel.Foreach (in this case, a string value that gets written to a database inside the loop), will that have any affect on the rest of the loop iterations or just the one I'm on? Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case?

Rocky
  • 182
  • 1
  • 1
  • 9

1 Answers1

2

There is no shared mutable state between actions in the Parallel.ForEach that I can see, so it should be thread-safe, because at most one thread can touch one object at a time.


But as it has been mentioned there is nothing shared that can be seen. It doesn't mean that in the actual code you use everything is as good as it seems here.

Or that nothing will be changed by you or your coworker that will make some state both shared and mutable (in the Thing, for example), and now you start getting difficult to reproduce crashes at best or just plain wrong behaviour at worst that can be left undetected for a long time.

So, perhaps you should try to go fully immutable near threading code?

Perhaps.

Immutability is good, but it is not a silver bullet, and it is not always easy to use and implement, or that every task can be reasonably expressed through immutable objects. And even that accidental "make shared and mutable" change may happen to it as well, though much less likely.

It should at least be considered as a possible option/alternative.


About the EDIT

If I change the value of a property of the item I'm working with while inside the Parallel.Foreach (in this case, a string value that gets written to a database inside the loop), will that have any affect on the rest of the loop iterations or just the one I'm on?

  1. If you change a property and that object is not used anywhere else, and it doesn't rely on some global mutable state (for example, sort of a public static Int32 ChangesCount that increments with each state change), then you should be safe.
  2. a string value that gets written to a database inside the loop - depending on the used data access technology and how you use it, you may be in trouble, because most of them are not designed for multithreaded environment, like EF DbContext, for example. And obviously do not forget that dealing with concurrent access in database is not always easy, though that is a bit away from our original theme.
  3. Would it be better to create a new instance of Thing inside the loop to store the item i'm working with in this case - if there is no risk of external concurrent changes, then it is just an unnecessary work. And if there is a chance of another threads(not Parallel.For) making changes to those objects that are being persisted, then you already have bigger problems than Parallel.For.

Objects should always have observable consistent state (unlike when half of properties set by one thread, and half by another, while you try to persist that who-knows-what), and if they are used by many threads, then they should be already thread-safe - there should be no way to put them into inconsistent state.

And if they want to be persisted by external code, such objects should probably provide:

  • Either SyncRoot property to synchronize property reading code.
  • Or some current state snapshot DTO that is created internally by some thread-safe method like ThingSnapshot Thing.GetCurrentData() { lock() {} }.
  • Or something more exotic.
Community
  • 1
  • 1
Eugene Podskal
  • 10,270
  • 5
  • 31
  • 53
  • I tried to go through the entirety of the code and make sure there are no calls to static methods. I went further and replaced all static method with instances and have been careful not to call anything created outside the `Parallel.Foreach` from inside the 'loop' (with the exception of the actual **item** I'm iterating against). I'm just very new to threading in general and I'm having a bit of a time wrapping my head around that whole thing. As above, In live code I enter the `Parallel.Foreach` almost immediately on entering the method and make all calls from inside it to instances. – Rocky Jan 17 '17 at 18:29
  • Edit: added to base question – Rocky Jan 17 '17 at 18:48
  • I think immutability might work here, as I don't usually need to change any item in the list except for that one string, and as stated I think I could just create a new instance of the class and pass **item** into it for that. After all, I'm just reading back lines in a file and taking some action based on them. I'll look into that. – Rocky Jan 17 '17 at 18:57
  • Concerning the EDIT. The objects don't rely on anything external, so I think I'm safe there and can avoid creating a separate instance of `Thing` for this. On the data write, I'm actually using a web service to facilitate this and in all instances where a write occurs, I am creating a new instance inside a using as `using(MyWebService mywSrv = new MyWebService()){ //...mywSrv.doWrite(data);}` so I think I'm safe there since I only open it for as long as it's needed, and it is created on the thread. – Rocky Jan 17 '17 at 20:06