Why is the throughput of this C# data processing app so much lower than the raw capabilities of the server?

Question

I have put together a small test harness to diagnose why the throughput of my C# data processing application (its core function selects records in batches of 100 from a remote database server using non-blocking IO and performs simple processing on them) is much lower than it could be. I've observed that while running, the app encounters no bottlenecks in the way of CPU (<3%), network or disk IO, or RAM and does not stress the database server (the data set on the database is almost always entirely in RAM). If I run multiple instances of the app in parallel, I can get up to ~45 instances with only ~10% degradation in latency but with a 45x increase in throughput before CPU utilization on the database server becomes a bottleneck (at that point, there are still no resource bottlenecks on the client).

My question is why doesn't the TPL increase the number of Tasks in flight or otherwise increase throughput when the client server is capable of substantially higher throughput?

Simplified code excerpt:

    public static async Task ProcessRecordsAsync()
    {
        int max = 10000;
        var s = new Stopwatch();
        s.Start();
        Parallel.For(0, max, async x => 
        {
            await ProcessFunc();
        });
        s.Stop();
        Console.WriteLine("{2} Selects completed in {0} ms ({1} per ms).", s.ElapsedMilliseconds, ((float)s.ElapsedMilliseconds) / max, max);
    }

    public static async Task ProcessFunc()
    {
        string sql = "select top 100 MyTestColumn from MyTestTable order by MyTestColumn desc;";
        string connStr = "<blah>...";

        using (SqlConnection conn = new SqlConnection(connStr))
        {
            try
            {
                conn.Open();
                SqlCommand cmd = new SqlCommand(sql, conn);
                DbDataReader rdr = await cmd.ExecuteReaderAsync();

                while (rdr.Read())
                {
                    // do simple processing here
                }
                rdr.Close();
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.ToString());
            }
        }
    }

Run profiler against the code to see the code that is using the most time and then determine if you should optimize that code. — , Dec 30 '16 at 16:57
@RudyTheHunter, it's the non-blocking database calls that take the most time, but those require almost no resources on the client and should parallelize easily as demonstrated by the fact that I can run 45 instances of the app without hitting any resource bottlenecks on the client. What I'm trying to understand is why I can't get that same degree of parallelization in a single process. — Dan Hermann, Dec 30 '16 at 17:01
It looks like the code is selecting the same top 100 rows from the database each time. I am wondering if you are hitting the same 100 rows over and over. You said that you are doing non-blocking database calls, but doing simple updates to them. Do you need to select a different 100 rows as a better example? I think you need to share more detail of what is happening in the process. — , Dec 30 '16 at 17:06
The rows in the database are slowly changed by a completely separate process running on a different machine and my app does only SELECTs, no INSERTs, UPDATEs, or DELETEs. I don't think any of those details matter, though, to the core question of why can I get ~45x the throughput by running multiple processes but cannot get any more throughput in a single process. — Dan Hermann, Dec 30 '16 at 17:27

score 4 · Accepted Answer · edited May 23 '17 at 11:53

4

Parallel For doesn't try to choke the life out of your processor and maximize the number of concurrent threads doing work for you. It uses the number of cores as a starting point and may ramp up depending on the nature of the workload. See this question.

As it happens, you actually do have blocking IO...when opening the connection and reading rows. You might try this instead:

//....
using (var conn = new SqlConnection(connStr))
{
  await conn.OpenAsync();
  SqlCommand cmd = new SqlCommand(sql, conn);
  try
  {
    using ( var rdr = await cmd.ExecuteReaderAsync())
    { 
      while (await rdr.ReadAsync())
      {
        // do simple processing here
      }
    }
  }
  catch (Exception ex)
  {
    Console.WriteLine(ex.ToString());
  }
}
//...

edited May 23 '17 at 11:53

Community

1
1

answered Dec 30 '16 at 20:13

Clay

4,999
1
28
45

2

Thanks for the link to the other question. I converted the additional calls you noted to async it resulted in a mere 1000x improvement in throughput so that was clearly the principal problem. I sloppily assumed that `conn.openAsync()` wouldn't matter much due to connection pooling and `rdr.ReadAsync()` wouldn't matter much due to the buffering that occurs when reading result sets. Clearly I was wrong. – Dan Hermann Dec 30 '16 at 20:29
Yeah - I've stumbled upon this very same assumption - easy to make - glad you got the perf gain you were looking for. – Clay Dec 30 '16 at 20:32

score 1 · Answer 2 · answered Dec 30 '16 at 18:23

1

Your example may be limited be the maximum number of pooled SQL Connections in the application, which is 100 by default. This might explain why you are getting more throughput when running multiple instances of the app. You could try to monitor the number of connections in SQL server to see if this is the case.

answered Dec 30 '16 at 18:23

Matt Cole

2,491
17
21

After converting all the calls to async as @Clay suggested, I am occasionally hitting the connection pool limit, so increasing it as you suggest will allow me some additional throughput. – Dan Hermann Dec 30 '16 at 20:32

Why is the throughput of this C# data processing app so much lower than the raw capabilities of the server?

2 Answers2

Linked