0

In an application I am working on, it needs to read data from about 5 to 10 tables in the same SQL Server database, load them into multiple List<> and process the data afterwards. The number of rows returned from each table varies, 1 row for some and about 1,000 rows for some others.

The application is written in C# 4.0, so I was thinking of using Task Parallel Library and start multiple tasks to retrieve data from these tables simultaneously. I thought it would be faster than calling a single query that returns the data from these tables in a sequence, but don't know for sure. So my questions are:

  1. Considering the overheads such as opening multiple connections to the database, would it really be faster to retrieve data using multiple tasks?
  2. If the answer is Yes in some cases, what are the cases where the answer is No?
  3. If the answer is No, is there any alternative approach?
  4. Any other things to consider when using multiple tasks to retrieve data?

To narrow down the scope, here are some assumptions you can make:

  • Data processing is not part of the question. Just the data retrieval alone.
  • Some of these tables are related with FKs. Like Parent <-- Child <-- Grandchild
  • These tables only contain varchar and numeric columns, and the row size is less than 400 bytes.
  • Plain ADO.NET (e.g. SqlConnection/SqlCommand/SqlDataReader) is used to retrieve the data from each table
  • Each task will call a self-contained method that reads the data from a SqlDataReader and returns a List of objects.
  • The database server is powerful enough to handle all concurrent connections and queries.
  • The client computer is powerful enough to handle the data operation if using single query to retrieve all data in a sequence. And it is capable of doing multi-threading.
  • Network is fast enough for transmitting data if using single query

Thank you in advance for your time and input!

svick
  • 236,525
  • 50
  • 385
  • 514
G.S.
  • 45
  • 7
  • If you have issues for such a _tiny_ amount of data, you should be looking at your mapping code, ie how you convert the results to lists of objects, your queries and your data access code. _Where is the code_? – Panagiotis Kanavos Apr 30 '14 at 07:24
  • For your question, we currently don't have issues loading data. I was just thinking using parallel would make it even faster. If it doesn't hurt anything, why wouldn't we do it. – G.S. Apr 30 '14 at 12:53
  • It _does_ hurt - multiple connections result in larger loads to the database and concurrency issues, multiple roundtrips add delays and latency, the code becomes more complex. Unless you are talking about several hundreds of thousands of rows, parallel loading makes no sense. – Panagiotis Kanavos Apr 30 '14 at 13:29

3 Answers3

3

Start off by saying I dont think this is what you need to focus on for this scenario. The amount of data does not seem to be an issue here, so I would focus on somewhere else.

To answer some of your questiosn. Yes, parallel loading can improve performance, but typically in instances where the data (row count) is a lot larger. You do need to be aware of the memory foot print though, as you dont want to kill the server with all the data in memory.

If you were running this mutiple times and quite frequently, I would say that then you need to focus on this.

As always, don't optimize until it becomes an issue.

It is more important to write code that is understandable and maintainable. Some months down the line you are going to thank your stars when you need to revist this code.

Also, what have you tried?

Have you done any benchmark testing? Write a small app and loop both cases several times (hundreds or even thousands) and measure the time it takes. Use a Stopwatch and see what the time differences are.

Adriaan Stander
  • 162,879
  • 31
  • 289
  • 284
  • Thanks for your quick response. I wanted to design the application so that it would scale better for larger volume of data. My concern is whether parallel would have negative impact in the case specified or if it's worth doing it. I think getting some opinions from others would be good. For your question, I did parallel loading in a couple web applications before, and it dramatically shortened the loading time, but I didn't do any benchmark testing. – G.S. Apr 29 '14 at 05:51
  • You have a tiny amount of data. If you have issues, the problem is in your code, not the database. Had you used a single SQL batch that returned multiple results, you would probably find that it worked even _faster_ than parallel loading because of fewer roundtrips and parsing – Panagiotis Kanavos Apr 30 '14 at 07:27
  • Thanks for your input @Panagiotis Kanavos. We don't currently have issues loading data, and using a single stored proc to return multiple results was the original plan. However, as the amount of data may grow in the future, we want to be proactive so it will scale better. What I am getting is that it's overkill for now, and maybe we should only think about it when it becomes a problem? – G.S. Apr 30 '14 at 12:50
  • I will mark this the answer, because I think astander made a good point, if it doesn't cause a problem, don't optimize it. It saves time, and makes things easier. – G.S. May 01 '14 at 02:27
0

10 tables with 1000 rows at max sounds like a small volume of data. Instead of opening multiple connections and performing parallel processing, I would rather have only one stored procedure returning all different tables and read it one-by-one using dataReader.NextResult().

See accepted answer on following StackOverflow question Multiples Table in DataReader

Community
  • 1
  • 1
Romeo
  • 1,093
  • 11
  • 17
  • Thanks for your input @Romeo. Part of me thinks the same way, but I also feel it's not too complicated to start multiple tasks. So if there are benefits doing it, why don't we? – G.S. Apr 29 '14 at 11:13
0

I am going to say "it depends" on the scenario. Let me explain, spawning of several parallel retrieval tasks and subsequent processing will be "faster" if a task does not depend on the query or processing result from another parallel task. If you go this route check out slightly dated but still relevant "Concurrency & Coordination With Futures in C#" and C# async, await feature.

Some part of me is thinking you introducing complexity that can be explored/investigated when the need arises and engineer only what is needed for now.

Before going the parallel route, please explore the possibility of lazy loading for different parts of the "web page" (I am assuming a web app) using jQuery/ajax to increase the perceived responsiveness of the web page.

No matter which approach you choose, its pertinent that you profile the web page using YSlow or similar tools.

Sorry for the rant!

Community
  • 1
  • 1
  • async, await is for asynchronous work, not parallell – adrianm Apr 29 '14 at 06:45
  • Thanks @Param Pavar. The results from these tasks are related but I don't think they are depended during data retrieval process. The results will be processed instead of shown in a web page, so I personally think lazy loading won't "speed things up" in this case. I somehow feel using parallel in data retrieval is not going to add too much complexity, but using is in data processing will. So I wanted to focus on data retrieval at this point. – G.S. Apr 29 '14 at 11:31