43

I need to load one column of strings from table on SqlServer into Array in memory using C#. Is there a faster way than open SqlDataReader and loop through it. Table is large and time is critical.

EDIT I am trying to build .dll and use it on server for some operations on database. But it is to slow for now. If this is fastest than I have to redesign the database. I tough there may be some solution how to speed thing up.

John Saunders
  • 160,644
  • 26
  • 247
  • 397
watbywbarif
  • 6,487
  • 8
  • 50
  • 64

11 Answers11

57

Data Reader

About the fastest access you will get to SQL is with the SqlDataReader.

Profile it

It's worth actually profiling where your performance issue is. Usually, where you think the performance issue is, is proven to be totally wrong after you've profiled it.

For example it could be:

  1. The time... the query takes to run
  2. The time... the data takes to copy across the network/process boundry
  3. The time... .Net takes to load the data into memory
  4. The time... your code takes to do something with it

Profiling each of these in isolation will give you a better idea of where your bottleneck is. For profiling your code, there is a great article from Microsoft

Cache it

The thing to look at to improve performance is to work out if you need to load all that data every time. Can the list (or part of it) be cached? Take a look at the new System.Runtime.Caching namespace.

Rewrite as T-SQL

If you're doing purely data operations (as your question suggests), you could rewrite your code which is using the data to be T-SQL and run natively on SQL. This has the potential to be much faster, as you will be working with the data directly and not shifting it about.

If your code has a lot of necessary procedural logic, you can try mixing T-SQL with CLR Integration giving you the benefits of both worlds.

This very much comes down to the complexity (or more procedural nature) of your logic.

If all else fails

If all areas are optimal (or as near as), and your design is without fault. I wouldn't even get into micro-optimisation, I'd just throw hardware at it.

What hardware? Try the reliability and performance monitor to find out where the bottle neck is. Most likely place for the problem you describe HDD or RAM.

badbod99
  • 7,429
  • 2
  • 32
  • 31
  • 1
    I have tested some thing, SqlDataReader is obviously faster than DataSet ;) Yes, loading time is hitting performance worst. – watbywbarif Sep 16 '10 at 12:06
  • And im am not sending to client, .dll is used on same machine as server for some internal usage. – watbywbarif Sep 16 '10 at 12:12
  • 2
    +1 for "rewrite as T-SQL". The ideal query is one that only retrieves absolutely necessary data. If you're retrieving 100k rows to the client app, then processing there, then perhaps you should re-consider your logic. – BradC Sep 16 '10 at 17:09
  • Throwing hardware at it doesn't really help. SqlClient uses an internal buffer based on the packet size. The biggest packet size is 32768 bytes, and that WILL impact the throughput. No sane amount of hardware will help you there. – Jörgen Sigvardsson Feb 06 '20 at 12:41
  • @JörgenSigvardsson - That's assuming your performance issue is related to the network speed and volume of data. It could be, but there are plenty of other places I would look first. – badbod99 Feb 25 '20 at 09:05
  • To put this into perspective: I have a similar problem where ExecuteReader() plus an empty rdr.Read() loop takes **50 ms** for 100k records but actually accessing the fields in the loop takes **200 ms** (that's after optimising field access from things like `(int)rdr["foo"]` to `rdr.GetInt32(0)`; before that it was more like 350 ms). For comparison: DataTable.Load() takes 750 ms in this situation. Anyway, the SqlDataReader interface is clearly the limiting factor here. Pulling the values from a CSV takes about 15 ms including object creation (with hand-wrought code). – DarthGizka Aug 29 '21 at 10:12
21

If SqlDataReader isn't fast enough, perhaps you should store your stuff somewhere else, such as an (in-memory) cache.

Steven
  • 166,672
  • 24
  • 332
  • 435
20

No. It is actually not only the fastest way - it is the ONLY (!) way. All other mechanisms INTERNALLY use a DataReader anyway.

TomTom
  • 61,059
  • 10
  • 88
  • 148
8

I suspect that SqlDataReader is about as good as you're going to get.

LukeH
  • 263,068
  • 57
  • 365
  • 409
  • 4
    Ha! Would either of the downvoters care to elaborate on what's wrong with this answer? – LukeH Sep 16 '10 at 13:40
6

SqlDataReader is the fastest way. Make sure you use the get by ordinal methods rather than get by column name. e.g. GetString(1);

Also worthwhile is experimenting with MinPoolSize in the connection string so that there are always some connections in the pool.

softveda
  • 10,858
  • 6
  • 42
  • 50
  • Can you explain more about MinPoolSize, I don't see how this should help? – watbywbarif Sep 16 '10 at 12:13
  • In .Net DB connections are returned to a connection pool after being closed and then eventually the underlying SQL server connection is closed after a period of inactivity. This generates the login and logout events. In certain scenario (infrequent web service calls) it may be beneficial to always have some ready connections in the pool to handle the first request quickly rather than having to open a new connection with the SQL server. – softveda Sep 16 '10 at 21:28
3

What about transforming one column of rows to one row of columns, and having only one row to read? SqlDataReader has an optimization for reading a single row (System.Data.CommandBehavior.SingleRow argument of ExecuteReader), so maybe it can improve the speed a bit.

I see several advantages:

  • Single row improvement,
  • No need to access an array on each iteration (reader[0]),
  • Cloning an array (reader) to another one may be faster than looping through elements and adding each one to a new array.

On the other hand, it has a disadvantage to force SQL database to do more work.

Arseni Mourzenko
  • 50,338
  • 35
  • 112
  • 199
  • It sounds strange but as this .dll is used on server it seems that i get data fater by SqlDataReader than by building one row in SQL. – watbywbarif Sep 16 '10 at 12:10
3

The SqlDataReader will be the fastest way. Optimize the use of it, by using the appropriate Getxxx method , which takes an ordinal as parameter.

If it is not fast enough, see if you can tweak your query. Put a covering index on the column (s) that you want to retrieve. By doing so, Sql Server only has to read the index, and does not have to go to the table directly to retrieve all the info that is required.

Frederik Gheysels
  • 56,135
  • 11
  • 101
  • 154
1

"Provides a way of reading a forward-only stream of rows from a SQL Server database" This is the use of SqlDataReader from MSDN . The Data structure behind SqlDataReder only allow read forward, it's optimized for reading data in one direction. In my opinion, I want to use SqlDataReader than DataSet for simple data reading.

coolkid
  • 543
  • 8
  • 21
1

You have 4 sets of overheads - Disk Access - .net code (cpu) - SQL server code (cpu) - Time to switch between managed and unmanaged code (cpu)

Firstly is

select * where column = “junk” 

fast enough for you, if not the only solution is to make the disk faster. (You can get data from SQL Server faster than it can read it)

You may be able to define a Sql Server function in C# then run the function over the column; sorry I don’t know how to do it. This may be faster than a data reader.

If you have more than one CPU, and you know a value the middle of the table, you could try using more than one thread.

You may be able to write some TSQL that combines all the strings into a single string using a separator you know is safe. Then split the string up again in C#. This will reduce the number of round trips between managed and unmanaged code.

Ian Ringrose
  • 51,220
  • 55
  • 213
  • 317
1

Some surface-level things to consider that may affect speed (besides a data-reader):

  1. Database Query Optimization
    • OrderBy is expensive
    • Distinct is expensive
    • RowCount is expensive
    • GroupBy is expensive
    • etc. Sometimes you can't live without these things, but if you can handle some of these things in your C# code instead, it may be faster.
  2. Database Table indexing (for starters, are the fields in your WHERE clause indexed?)
  3. Database Table DataTypes (are you using the smallest possible, given the data?)
  4. Why are you converting the datareader to an array?
    • e.g., would it serve just as well to create an adapter/datatable that you then would not need to convert to an array?
  5. Have you looked into Entity Framework? (might be slower...but if you're out of options, might be worthwhile to look into just to make sure)

Just random thoughts. Not sure what might help in your situation.

Greg
  • 23,155
  • 11
  • 57
  • 79
おたく
  • 11
  • 3
  • 6
    Ordering, Grouping, etc. are faster to handle in C# than in SQL? That doesn't seem likely but maybe you know something I don't. – Brandon Moore Feb 10 '12 at 01:10
0

If responsiveness is an issue loading a great deal of data, look at using the asynchronous methods - BeginReader.

I use this all the time for populating large GUI elements in the background while the app continues to be responsive.

You haven't said exactly how large this data is, or why you are loading it all into an array.

Often times, for large amounts of data, you may want to leave it in the database or let the database do the heavy lifting. But we'd need to know what kind of processing you are doing that needs it all in an array at one time.

Cade Roux
  • 88,164
  • 40
  • 182
  • 265