0

I want to do independent tasks of parsing multiple files in a system and get the version of each as follows:

public void obtainVersionList()
{

    for(int iterator = 1; iterator < list.length; iterator++) //list stores all the file names
    {
        Thread t = new Thread( () => GetVersion(ref list[iterator]) 
        //list will again store the fileVersions using GetVersion()
    }
}

Here,

  1. I get Index out of bounds exception. How's that possible as I've checked a condition iterator < list.length. Is this due to multiple threads running?
  2. How to minimize the operation time when we parse multiple files in the disk?
pb2q
  • 58,613
  • 19
  • 146
  • 147
stack_pointer is EXTINCT
  • 2,263
  • 12
  • 49
  • 59
  • 4
    Why is list being passed by `ref`? Alarm bells are ringing. Not to mention other problems, like creating your own threads. Which version of .net are you using, it affects which solution you get. – weston Oct 09 '12 at 12:10
  • Second @weston, especially with concurrent processing you might end up with a lot of synchonization or memory visibility errors. Minimising object's mutability turned out to be a good advice in my experience. – Matthias Meid Oct 09 '12 at 12:13
  • possible duplicate of [From Eric Lippert's blog: "don't close over the loop variable"](http://stackoverflow.com/questions/3190578/from-eric-lipperts-blog-dont-close-over-the-loop-variable) – L.B Oct 09 '12 at 12:29
  • possible duplicate of [How to update the GUI from another thread in C#?](http://stackoverflow.com/questions/661561/how-to-update-the-gui-from-another-thread-in-c) – kprobst Oct 11 '12 at 21:04

5 Answers5

2

For parallel executions I'd recommend you Parallel.ForEach (or the Task class):

Parallel.ForEach(list, item => GetVersion(ref item));

The TPL you use then does the thread management for you, typically by using a thread pool. You can, however, use different scheduler implementations. In general, re-using threads is cheaper than spawning many.

Inspired by weston's suggestions I tried out an alternative, which may be considered creative LINQ usage:

static void Main(string[] args)
{
    var seq = Enumerable.Range(0, 10).ToList();
    var tasks = seq
        .Select(i => Task.Factory.StartNew(() => Foo(i)))
        .ToList(); // important, spawns the tasks
    var result = tasks.Select(t => t.Result);

    // no results are blockingly received before this
    // foreach loop
    foreach(var r in result)
    {
        Console.WriteLine(r);
    }
}

static int Foo(int i)
{
    return i;
}

For each input in seq I create a Task<T> doing something. The Result of these tasks is collected in result, which is not iterated before the foreach. This code does maintain the order of your results too.

The sample does not modify seq. This is a different concept than altering list as you want to do.

Matthias Meid
  • 12,455
  • 7
  • 45
  • 79
  • What does the `ref` achieve in this? – weston Oct 09 '12 at 12:15
  • Nothing particular, I used the OP's signature of `GetVersion(ref foo)`. – Matthias Meid Oct 09 '12 at 12:18
  • I believe he is passing the list by ref so he can get the results of GetVersion in the same list. – weston Oct 09 '12 at 12:21
  • I tried out with `foreach`, and this is why I did not see that `ref list[iterator]` is not correct C# syntax. Nor is `length` a field of a .NET collection I know. So, is this Java or maybe pseudocode...? Anyway, I agree passing and modifying the collection will go badly wrong unless it does be a thread-safe collection. – Matthias Meid Oct 09 '12 at 12:31
2

The iterator variable is being captured by reference, not by value. That makes all threads share the same variable. Copy it to a loop-local variable first before using it in the lambda.

Everyone falls for this at least once. The C# designers have regretted this decision so much they consider changing it.

usr
  • 168,620
  • 35
  • 240
  • 369
1

To solve the index out of bounds problem you could make a local copy of the iteration variable:

for(int iterator = 1; iterator < list.length; iterator++) //list stores all the file names
{
     int iterator1 = iterator;
     Thread t = new Thread( () => GetVersion(ref list[iterator1]);
     //list will again store the fileVersions using GetVersion()
}

2) How to minimize the operation time when we parse multiple files in the disk?

That's not really a good idea when you have a single mechanical disk. You're only bouncing the mechanical head around as each thread gets a chance to run. Stick to a single thread for disk I/O.

Tudor
  • 61,523
  • 12
  • 102
  • 142
  • "Stick to a single thread for disk I/O" It is much quicker to launch multiple threads for this kind of work because the IO operations are blocking and so the thread is just asleep while the IO goes on. Even with one CPU it's better. I use this to calculate total sizes of directories, spawning a task for each sub directory. Try it vs one thread - you might be surprised at the results. – weston Oct 09 '12 at 13:08
  • @weston: Depends on the length of the I/O. For long file reads it will definitely hurt performance. – Tudor Oct 09 '12 at 13:22
  • 1
    Maybe, but he is just getting file version info I assume. Either way, to the OP: take no one's word for it, if performance is important to you, profile both ways. – weston Oct 09 '12 at 21:14
  • @weston: Profiling is the best idea actually. I agree. – Tudor Oct 10 '12 at 04:11
0

See this question

Do not close over your iterator variable. Instead, create a local variable and close over that:

public void obtainVersionList()
{
    //list stores all the file names
    for(int iterator = 1; iterator < list.length; iterator++) 
    {
        //list will again store the fileVersions using GetVersion()
        var local = list[iterator];
        Thread t = new Thread( () => GetVersion(ref local);
    }
}
Community
  • 1
  • 1
qxn
  • 17,162
  • 3
  • 49
  • 72
0

You shouldn't let multiple threads adjust the same list. This is not threadsafe unless the list is threadsafe. I don't know the type, but List<string> isn't.

The other thing is that you shouldn't create own threads for this. What if the list is 200 files, your PC will grind to a halt creating 200 threads. Let threadpool do the work of managing a sensible number of threads for you.

This solution assumes you have .net4.

Change signature of GetVersion to: private static string GetVersion(string file)

        var tasks = new List<Task>();
        //start tasks
        foreach (var file in list)
        {
            var localFile = file; //local variable on advice of resharper
            tasks.Add(Task<string>.Factory.StartNew(() => GetVersion(localFile)));
        }
        //wait for them to complete
        Task.WaitAll(tasks.ToArray());
        //read the results
        IEnumerable<string> result = tasks.OfType<Task<string>>().Select(e => e.Result);
        //print em out for test
        foreach (var str in result)
        {
            Console.WriteLine(str);
        }
weston
  • 54,145
  • 21
  • 145
  • 203