0

First about my goal:

I am importing a table with about 1000-5000 rows to a DataTable. This one is bound to a DataGridView. Now for every row there has to run a process that takes about 5-10 seconds. After a single process finished I want to write back the result to the DataTabel (result-column).

Because this process is independent I want to use multithreading to speed it up.

This is an example structure of my current code:

// Will be created for each row
public class FooObject
{
    public int RowIndex;
    public string Name;
    //...
}

// Limiting running tasks to 50
private Semaphore semaphore = new Semaphore(50, 50);
// The DataTable is set up at start-up of the App (columns etc)
private DataTable DtData { get; set; } = new DataTable();

// The button that starts the process
private void btnStartLongRun(object sender, EventArgs e)
{
    // some init-stuff
    StartRun();
}

private async void StartRun()
{
    for (int rowIndex = 0; rowIndex < DtData.Rows.Count)
    {
        // Creating a task to not block the UI
        // Using semaphore here to not create objects
        // for all lines before they get in use.
        // Having this inside the real task it consumed
        // a lot of ram (> 1GB)
        await Task.Factory.StartNew(() => 
        {
            semaphore.WaitOne();
        });

        // The row to process
        var currentRow = DtData.Rows[rowIndex];

        // Creating an object from the row-data
        FooObject foo = new FooObject()
        {
            RowIndex = rowIndex;
            Name = currentRow["Name"].ToString();
        }

        // Not awaiting because I want multiple threads
        // to run at the same time. The semaphore is
        // handling this
        TaskScheduler scheduler = TaskScheduler.Current;
        Task.Factory.StartNew(() =>
        {
            // Per-row process
            return ProcessFoo(foo);
        }).ContinueWith((result) =>
        {
            FinishProcessFoo(result.Result);
        }, CancellationToken.None, TaskContinuationOptions.OnlyOnRanToCompletion, scheduler);
    }
}

private FooObject ProcessFoo(FooObject foo)
{
    // the actual big process per line
}

private void FinishProcessFoo(FooObject foo)
{
    // Locking here because I got broken index errors without
    lock(DtGrid.Rows.SyncRoot)
    {
        // Getting the row that got processed
        var procRow = DtData.Rows[foo.RowIndex];
        // Writing the result to that row
        procRow["Result"] = foo.Result;

        // Raising the progressbar
        pbData.Value++;
    }

    // Letting the next task start.
    semaphore.Release();
}

The big problem:

In the beginning everything is working fine. All threads are running smooth and doing their job. But as longer the app runs, as more it is getting unresponsive. It looks like the app is slowly starting to block more and more.

I started a test-run with 5000 rows. It got in stuck at around row 2000. Sometimes even an error raises that the app isn't responding.

I haven't got a lot experience in multi threading. So maybe this code is totally bad. I appreciate every help in here. I would also be happy about pointing me into another direction to get this running better.

Thank you very much.

Edit
If there is anything I can debug to help in here just tell me.

Edit 2
I already enabled all Common Language Runtime Exceptions to check if there is anything that's not raising an error. Nothing.

C4d
  • 3,183
  • 4
  • 29
  • 50
  • Why don't you simply use a Parallel.For if you want to process the items in parallel?: https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/how-to-write-a-simple-parallel-for-loop – mm8 Jul 20 '17 at 14:23
  • Because I couldn't limit the threads right? If throwing all lines in a parallel the app would create 5000 objects immediately which would end up in a big memory consumption. Maybe I'm wrong. But this is how I think. – C4d Jul 20 '17 at 14:36
  • I dont want them all to run at the same time. There has to be a limit. 5000 is just too much. Also I need to write back the results. I think I would need to invoke the controls with parallel. – C4d Jul 20 '17 at 14:39
  • You could use the ParallelOptions.MaxDegreeOfParallelism property: https://stackoverflow.com/questions/9538452/what-does-maxdegreeofparallelism-do – mm8 Jul 20 '17 at 14:39

2 Answers2

1

If you want to process up to 50 rows in parallel, you could consider using a Parallel.For with a MaxDegreeOfParallelism of 50:

Parallel.For(0, DtData.Rows.Count, new ParallelOptions() { MaxDegreeOfParallelism = 50 }, rowIndex => 
{
    //...
});
mm8
  • 163,881
  • 10
  • 57
  • 88
  • I agree with @mm8, parallel processing is the way to go with this one –  Jul 22 '17 at 16:03
  • I'll need some time to test this out. I'll for sure come back to give feedback or mark as answered. – C4d Jul 24 '17 at 07:22
  • Do I have to invoke the datagridview when filling the datatable from inside the parallel.for? – C4d Jul 24 '17 at 08:40
  • Inside `StartRun` I replaced my `for-loop` with `Parallel.For`. UI is freezing completely. – C4d Jul 24 '17 at 09:00
  • I started a single thread that is running through the `Parallel.For`. Looks like the crashes are gone. The "why" wasn't cleared out for my problem but this is a working alternative. Thanks. – C4d Jul 24 '17 at 14:49
0
  1. Starting a new task just to call WaitOne on a Semaphore is a waste of time.

  2. You are using the UI thread to coordinate thousands of async tasks. This is bad. Wrap your call to StartRun in a new task to avoid this.

  3. The better way of doing this is to divide the number of rows by the number of processors, then start one task per processor for just those rows. No need for Semaphore then.

hoodaticus
  • 3,772
  • 1
  • 18
  • 28
  • If I would split the rows to the count of threads I want to use, it would always wait until all (5) threads finished until the next 5 start. I was rather trying to get 5 run at a time, but start another if a single one finished. Always waiting for 5 to start new 5 would be also a waste I think. Point 2 makes sense to me. Could be the cause of the crashing. I'll try out this point. – C4d Jul 24 '17 at 07:25