2

We have an application, which pulls data from S3 and writes to the database. In the application, we are opening threads for concurrency/parallelism where each thread pulls from a particular S3 key.

The original application code was written in .Net framework 4.6.1, we migrated the codebase to .Net core 3.0. It was an easy transition.

Below is the code snippet for Parallel.ForEach for pulling data and processing:

Parallel.ForEach(PotentialFiles.Rows.OfType<DataRow>(), (row) =>
{
    if (ProcessFile(row[1].ToString(), Date, 15))
    {
        LastFileID = Math.Max(LastFileID, Convert.ToInt32(row[0]));
        FirstFileID = Math.Min(FirstFileID, Convert.ToInt32(row[0]));
    }
});

We are getting aggregation exception:

System.AggregateException: One or more errors occurred. (One or more errors occurred. (A task was canceled.))

ProcessFile method includes S3 get object from AWSSDK.S3 which is the asynchronous method. But the same piece of code runs flawlessly on Windows.

We could fix the issue using Partitioner:

var tasks = System.Collections.Concurrent.Partitioner.Create(PotentialFiles.AsEnumerable())
                .GetPartitions(10)
                .Select(partition => Task.Run(() =>
                {
                    using (partition)
                    {
                        while (partition.MoveNext())
                        {
                            var row = partition.Current;
                            if (ProcessFile(row[1].ToString(), Date, 15))
                            {
                                LastFileID = Math.Max(LastFileID, Convert.ToInt32(row[0]));
                                FirstFileID = Math.Min(FirstFileID, Convert.ToInt32(row[0]));
                            }
                        }
                    }
                })).ToArray();

await Task.WhenAll(tasks);

With the above piece of code, it works on the Linux instance.

If .Net Core is cross-platform, then why the same piece of code does not work on different platforms (Windows and Linux). Is it the issue with Task scheduler being different on different platforms? What am I missing here?

Karan Nadagoudar
  • 434
  • 1
  • 5
  • 10
  • _[Don't mix await with `Parallel.ForEach`](https://stackoverflow.com/a/11565317/585968)_. Use TPL DataFlow instead –  Nov 20 '19 at 11:16
  • Can you share the inner exception you get? I suspect it might have to do with the sockets, but that is a very long shot. – Nick Nov 20 '19 at 11:19
  • Did you do a Clean Build? Moving code my not recompile everything do to dependencies in the compiler. – jdweng Nov 20 '19 at 11:30
  • I did clean build – Karan Nadagoudar Nov 20 '19 at 11:37
  • You shouldn't be using `Parallel.For`xxx for I/O either. It's meant for compute/CPU-bound operations –  Nov 20 '19 at 11:38
  • I understand `Parallel.For`xxx should not be used for I/O but we were migrating the existing code so we did not modify that section. – Karan Nadagoudar Nov 20 '19 at 11:44

0 Answers0