1

I need to help solve the problem with shared function in Parallel.ForEach. I got an error lower, how can I change the function to be saved for work with threads ?

public IEnumerable<Datamodel> LoadLibrary(IEnumerable<Items> items)
        {
            var allLibReferences = new List<LibraryReferenceModel>();

            var baseData = LoadBaseLibData();

            Parallel.ForEach(baseData, data =>
            {
                var item = items.ToList().FindAll(c => c.Name == data.Name);

                CreateLibraryReference(allLibReferences, item, data.Name); // Problem to call function in Parallel.ForEach


            });
            return allLibReferences;
        }

private static void CreateLibraryReference(ICollection<LibraryReferenceModel> allLibReferences,
            IReadOnlyCollection<Item> item, string libraryName)
        {
            allLibReferences.Add(item.Count == 0
                ? new LibraryReferenceModel
                {
                    LibName = libraryName,
                    HasReference = false,
                    References = item.Count
                }
                : new LibraryReferenceModel
                {
                    LibName = libraryName,
                    HasReference = true,
                    References = item.Count
                });
        }

I got This exception (the index is out of array bounds):

enter image description here

Thank you

StuartLC
  • 104,537
  • 17
  • 209
  • 285
Jan Sršeň
  • 1,045
  • 3
  • 23
  • 46

1 Answers1

3

As you've found, since multiple threads are attempting to add new items to the shared allLibReferences collection, you'll find erratic thread safety issues like the error you've described.

This is why it's really important to make your code thread safe before you consider parallelising it. One of the best techniques is to ensure that you rely on immutable code constructs, i.e. never try and change (mutate) the value of a shared variable during parallel code.

So I would change the way the code works, so that instead of sharing a collection, what we do is project the items needed immutably, which can be safely parallelised (I've used .AsParallel, as its simpler), and then you can collate the results and return them.

Furthermore, since the whole point of parallelism is to make code run as quickly as possible, you'll also want to remove inefficiencies such as materialising the same items in a list during each iteration (items.ToList()), and you'll also want to avoid O(N) iterations during a loop if possible - I've replaced .FindAll(c => c.Name == data.Name) with a pre-calculated dictionary.

Putting that altogether, you'll wind up with something like this:

public IEnumerable<LibraryReferenceModel> LoadLibrary(IEnumerable<Item> items)
{
    var keyedItems = items.GroupBy(i => i.Name)
        .ToDictionary(grp => grp.Key, grp => grp.ToList());

    var baseData = LoadBaseLibData();

    var allLibReferences = baseData
        .AsParallel()
        .SelectMany(data => 
        {
            if (keyedItems.TryGetValue(data.Name, out var matchedItems))
            {
                return matchedItems
                    .Select(i => ProjectLibraryReference(i, data.Name));
            }
            // No matches found
            return new LibraryReferenceModel
            {
              LibName = data.Name,
              HasReference = false,
              References = 0
            };
        })
        .ToList();
    return allLibReferences;
}


private static LibraryReferenceModel ProjectLibraryReference(IReadOnlyCollection<Item> item, 
    string libraryName)
{
    return new LibraryReferenceModel
        {
            LibName = libraryName,
            HasReference = item.Count > 0,
            References = item.Count
        };
}

I've assumed that multiple items can have the same name, hence we're grouping before creating the Dictionary, and then we're flattening the projected results with .SelectMany at the end.

StuartLC
  • 104,537
  • 17
  • 209
  • 285
  • 1
    As an aside, the OP's question is maked `asp.net`. It is generally a bad idea to spawn additional threads in ASP.Net, as per [Steven Cleary](https://stackoverflow.com/a/38094220) – StuartLC Aug 17 '20 at 12:44