1

I m not on .NET 4.

I get a huge list from a data source. When the number of elements in the list are higher than X i like to partition the list, assign each partition to a thread. after processing partitions i like to merge them.

            var subsets = list.PartitionEager(50000);

            //var subsets = list.Partition<T>(50000);

            Thread[] threads = new Thread[subsets.Count()];
            int i = 0;
            foreach (var set in subsets)
            {
                threads[i] = new Thread(() => Convertor<T>(set));
                threads[i].Start();
                i++;
            }

            for (int j = 0; j < i; j++)
            {
                threads[j].Join();
            }

Convertor method is a static method that takes a list and does some lookup.

   public static void Convertor<T>(List<T> list) where T : IInterface        {

        foreach (var element in list)
        {
            **// do some lookup and assing a value to element
            // then do more lookup and assign a value to element**
        }

    }

When i run this code, even though i know that most of the elements will be assigned a value. They are in fact coming back null.

I m aware that the copy of the list will be passed to the method but any change to the element should be reflected in the upper method. however this is happening only within the final subset.

I even added some code to merge the lists into a single one.

                list.Clear();

                foreach (var set in subsets)
                {
                    list.AddRange(set);
                }

code for paritioning:

    public static List<List<T>> PartitionEager<T>(this List<T> source, Int32 size)
    {
        List<List<T>> merged = new List<List<T>>();
        for (int i = 0; i < Math.Ceiling(source.Count / (Double)size); i++)
        {
            merged.Add(new List<T>(source.Skip(size * i).Take(size)));
        }

        return merged;
    }

What am i doing wrong? how to resolve this issue? i d like the elements to be assigned values after the lookups? is this related to synchronization or parameter passing?

DarthVader
  • 52,984
  • 76
  • 209
  • 300

2 Answers2

3

If .NET 4 is an option, you can just use Parallel.For or Parallel.ForEach. These methods automatically handle partitioning for you, as well as providing many other advantages in terms of scalability across multiple degrees of concurrency on different systems.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • @user177883: You can also do this with .NET 3.5 by installing the Rx Extensions, as they include these methods. – Reed Copsey Mar 23 '11 at 22:16
2

Looks like you're having modified closure while creating threads. If I'm correct then all your threads update the same (last) set. Modify the code in this way:

        foreach (var set in subsets)
        {
            var setLocalCopy = set;
            threads[i] = new Thread(() => Convertor<T>(setLocalCopy));
            threads[i].Start();
            i++;
        }
Snowbear
  • 16,924
  • 3
  • 43
  • 67
  • but what do i do with a local copy? Convertor assing values to objects. – DarthVader Mar 23 '11 at 22:19
  • 1
    @user, it won't be a local copy of entire list, it will be just a copy of `set` reference. It will still update the same items that you were updating. – Snowbear Mar 23 '11 at 22:21
  • 1
    thanks. this was it. so what s the lesson here? why did this happen? – DarthVader Mar 23 '11 at 22:28
  • 1
    @user, this situation is named `modified closure`, you can google it. Often it happens when you pass your enumerator to some lazy method (`ThreadStart` delegate in your case) and then you modify enumerator **before** executing delegate. See for example here: http://stackoverflow.com/questions/235455/access-to-modified-closure. – Snowbear Mar 23 '11 at 22:32