0

I have a POCO entity that has a hashset as navigation property. In some cases I need to add to this collection new elements, so I was thinking in something like that:

MyType myPocoEntity = new MyType();
Parallel.ForEach(myList,
    (iterator, state) =>
{
    myPocoEntity.MyCollection.Add(new MyType(){ Id = iterator.Id });
}

The idea is for each element in my list, create a new object that I have to add to the hashset of my POCO entity.

Just is an add operation, but I don't know if although it is only an add operation, if I could can problems, or if I should use first a concurrent collection and later pass the elements to the navigation property, but in this case I guess it would be better use a for loop and not parallel.

So I would like to know if it is a good idea to use a parallel foreach or not.

Thanks.

Álvaro García
  • 18,114
  • 30
  • 102
  • 193
  • I do not think this operation warrants the use of parallelization. There is not much CPU bound work to do. That said, `MyCollection` should be a Concurrent collection if you want to use parallelization. – Peter Bons Oct 29 '17 at 09:39
  • 1
    `HashSet` is not thread-safe. So you can have problems with `Add` (what if one thead starts adding, then the second thread starts adding the equal item and finally the 1st thread continue adding?) You can try using Parallel Linq instead of `Parallel.ForEach` and materialize the result ino, say, Dictionary – Dmitry Bychenko Oct 29 '17 at 09:39
  • 1
    In short, no. Generic collections use array(s) internally. When the array(s) need to be resized to add more elements, a new array is allocated and the items from the old array are copied. So, parallel additions can in theory write to different arrays! https://stackoverflow.com/questions/33096775/locking-hashset-for-concurrency – Slai Oct 29 '17 at 16:00
  • No, you can't use `HashSet` like that. See marked duplicate. As far as what's "a good idea", that depends on a lot of other context you haven't shared. There's nothing in the code you did share that suggests `Parallel` would be useful at all. – Peter Duniho Oct 29 '17 at 18:47

1 Answers1

1

It's my assumption that switching thread contexts and synchronization is more expensive than HashSet<T>.Add() call even for large sets added from for loop. However, the fastest way to add new members to collections is always to use HashSet<T>.AddRange operation and due to that I would optimize code even further by creating new collection if it would be significantly smaller than target collection and than I would be adding it with single call to HashSet<T>.AddRange.

Jacek Blaszczynski
  • 3,183
  • 14
  • 25