4

DataSets can be 10+ times slower than DataReader at retrieving data from DB. I assume this is due to overhead of DataSets having to deal with relations etc. But is the speed difference between DataSets and DataReader due to DataSets having to retrieve more data (information about relations ...) from DB, or due to application having to do more processing, or both?

I assume DataAdapter uses DataReader under the hood and thus the number of commands application needs to execute in order to retrieve 100 rows with DataAdapter is equal to or greater than number of commands app needs to execute if these 100 rows are retrieved directly by DataReader. Does DataReader retrieve one row at the time or one field (of a particular row) at a time?

Jacob
  • 77,566
  • 24
  • 149
  • 228
SourceC
  • 3,829
  • 8
  • 50
  • 75
  • How do you know DataSets are 10 times slower than DataReaders ? I've not measured any difference even close to that number, more like +/- 5% difference. – nos Jul 18 '09 at 19:24
  • Do you have a reference or some proof for that "10x" ? – H H Jul 18 '09 at 19:25
  • 1
    Regardless of the details, in general, DataReaders are often quite a bit faster (in my experience almost always much more than just 5% if your query actually returns many rows - otherwise it just doesn't matter anyhow). I'm not sure why DataSets are that much slower, and I suppose it's a reasonable question. – Eamon Nerbonne Jul 18 '09 at 19:59
  • I've read it at the following link: http://www.devx.com/vb2themax/Article/19887/1954?pf=true – SourceC Jul 18 '09 at 21:22

6 Answers6

11

There are some different types of overhead that can occur when using a DataSet over a DataReader:

A DatSet contains DataTable objects, which contains DataRow object, that contain the data. There is a small overhead creating all the objects. Each DataRow treats all it's values as objects, so any value types are boxed which adds a bit of overhead for each field.

When you use a DataAdapter to populate a DataSet, it's easy to get a lot of data that you won't use. If you don't specify what fields you want, you get all the fields even if you won't use them all. If you don't filter the query, you get all the rows from the table. Even if you filter them later with a DataView on the DataTable, you still have fetched them from the database. With a DataReader you are closer to query that gets the data, so the connection to what you get in the result is more obvious.

If you fetch data into several DataTable objects in a DataSet and use relations to let the DataSet combine the data, you make the DataSet do work that you could have let the database do, which is more optimised for it.

If you use a DataSet well, the overhead is not that bad, rather 30% than 1000%.

You are correct to assume that a DataAdapter uses a DataReader. If you are careful how you use the DataAdapter, the database operations itself is the same as if you use the DataReader yourself.

A DataReader will fetch a record at a time from the underlying database driver, which in turn will fetch a buffer full of records at a time from the database. If the records are very large only one at a time might fit in the buffer, but usually there are tens of records in the buffer or even hundreds if they are really small.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
5

A few pointers on MSDN:

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
4

A DataReader is akin to the old ADO notion of a forward-only recordset. A forward-only recordset does not need to maintain any kind of pointer information and as such is always faster than traditional DataSet instances. A DataReader, therefore, does not cache any data and needs an open connection to the database to obtain the underlying data.

This explains why you need to write the following code for a DataReader:

DataReader reader = GetANewDataReaderInstance();
while (reader.Read())
{
    ...
}

Now versus a DataSet, things change quite a lot. In essence you can pull in a subset of the database schema into a DataSet, things like table relations, delete and update rules and of course, automatically repopulating rows inside a dataset when you submit data to the data store. DataSets can also operate in a "disconnected" fashion, which means that you can obtain the data you need and the close the database connection.

Summary
All this functionality however does come at a cost: if you are looking to populate custom business objects as quickly as possible, go with DataReaders otherwise go with DataSets.

Mike J
  • 3,049
  • 22
  • 21
2

Be very careful about what you find in Internet searches. The article you posted says

A VB.NET application written with Release Candidate one of .NET

That's about seven years ago!

Be sure to see something a little more current:

John Saunders
  • 160,644
  • 26
  • 247
  • 397
1

Have a read of Why I Don't Use DataSets in My ASP.NET Applications by Scott Mitchell. It covers the salient points you are interested in, I think.

Dan Diplo
  • 25,076
  • 4
  • 67
  • 89
  • That article shows a highly unbelievable graph demonstrating a dataset climbing from 1 to 10 seconds as the number of results climbed from 100 to 1000. Frankly unlikely. – mrmillsy Dec 24 '14 at 10:03
1

I prefer to use DataReader to populate Lists.

You can check out my Question I recently asekd about this with some good comments/answers:

Datasets

Community
  • 1
  • 1
Jack Marchetti
  • 15,536
  • 14
  • 81
  • 117