10

The DataReader is more efficient than a DataTable if you only need to show data but not manipulate it. However, to get a DataReader from the data access layer should I leave the connection object open? I think this is also a very big efficiency problem. So is there another approach to this in order to take full advantage of the DataReader?

Ernest
  • 1,370
  • 13
  • 23
erasmus
  • 925
  • 2
  • 9
  • 16

9 Answers9

12

Yes, the data reader is definitely the most efficient - but you do not want to keep a connection open for a long period of time!

  • use the DataReader to read your data into an entity object; open the connection, read the data, close the connection
  • do whatever you need to do with your business object
  • store the changes back, e.g. by using an ad-hoc SQL query, a stored procedure, or whatever else you want; again: open the connection, write back the changes, close the connection

This is probably the most efficient you can get - it's a bit of work, some boring code, and all, but it's about as fast as it can be.

If you're more interested in developer productivity than raw speed, why not use some kind of an ORM to do all this boring, annoying mapping back and forth? Saves you a lot of coding and messy stuff to maintain!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • Thank you for the answer. I normally use ORM or create my custom business logic layer with my entity objects. However, for a small project I tried just to use data reader. Is there anything I can do except orm or entity objects? – erasmus Mar 10 '10 at 19:11
  • 2
    @erasmus If it's truly a small project, don't worry about efficiency. Just use whatever is easiest for you. There's no rule that every application needs to perform as well as it possibly can. The only expectation is that it performs as well as it needs to in order to meet the expected needs. – Mike Mooney Mar 10 '10 at 19:14
  • @erasmus: agree with Mike - the smaller the project, the less crucial is the last bit of performance. Use Linq-to-SQL, NHibernate or something - it's just *much* easier to get things done which is definitely more important in a small project – marc_s Mar 10 '10 at 19:40
3

I never let a DataReader out into the wild (out of the DAL). Only a matter of time before you are leaving connections open somewhere. Also, I’m almost never dealing with so much data on a single call where passing around a DataTable or DataSet presents a problem.

We are using an object oriented language and the DAL can really take advantage of this. There should only be one line of code in your project that gets the connection string. Only one object that actually touches the database (calls ExecuteNonQuery, DA.Fill(), etc.)

This also lets you get pretty involved with logging exceptions, etc. because you are only doing it once. So in the one DAL base class that I use for all of my DAL object in all of my project I have the logic that if the DAL throws an exception then it is logged to a table in my database. This logging fails-over to a text file if the database logging fails.

So the code I see a lot looks like:
-   Start a try block
-   Make a SQLCommand
-   Get connection string.
-   Make Connection object
-   Open the connection
-   Get the data
-   Bind the data
-   Close the connection
-   Log error if exception

Since I encapsulate all of this, my code to get the data now looks like:

GridView1.DataSource = cProgram.DB.getMyData();

(or more likely a BAL object sits between the 2). The DB is a normal object (not static), but it only instantiated once per application.

JBrooks
  • 9,901
  • 2
  • 28
  • 32
2

Let your data layer return objects, not datatables or data readers. I would suggest you populate your objects using a data reader.

Robert
  • 1,466
  • 10
  • 25
  • Is it ok for my view layer to know about my domain objects? – jim Mar 10 '10 at 19:11
  • there is no domain object here if you mean entities. – erasmus Mar 10 '10 at 19:13
  • 2
    @jim - that is a completely separate question, but the answer is typically no. – Keith Rousseau Mar 10 '10 at 19:20
  • @jim - your View layer should not rely on the specific structure of the tables upon which your Data layer is built. However, if by "domain objects" you mean the business objects (or "models") then, yes, these are the lingua franca for communications between the UI and the BLL. Now, you'll find many cases where the *mapping* from data objects to models (domain objects) is pretty trivial and you'll be tempted to just skip the formalities but don't - you'll be glad you kept them separate down the road. – Mark Brittingham Mar 10 '10 at 19:39
  • @Keith, @erasmus - what is your definition of a domain object? In the AI community and software in general as far as I've read, the term "domain object" is pretty specifically the objects of the world - those with which we think, compute or display. These are *mapped* to low-level representations in a database but they are not the same thing. By your answer, you seem to be thinking of "domain object" as something akin to a database record. – Mark Brittingham Mar 10 '10 at 19:42
  • @Mark - By the time you get to your View or UI you should be mapping to ViewModels, made specifically for displaying to the UI. – Keith Rousseau Mar 10 '10 at 21:08
  • @Keith - I see what you are after. It may be the kind of projects that I work on or the fact that I am a solo developer but the abstractions that I create (the Models) are typically used in both the BLL *and* the UI. I don't typically abstract away from the database layer for the BLL and then abstract away a second time for the UI (they'd end up being too similar anyway). As I'm typing this, I can actually think of a case where I *have* done this but, certainly, it isn't the usual case for me. Thanks for your thoughts, though. – Mark Brittingham Mar 11 '10 at 00:43
1

Let us have a silly benchmark to check how much faster the DataReader (.Net version 4) is. I fetched one record from the database (SQL Server 2000) and read all its fields. I repeated this process 1000 times. DataReader took 17.3327585 seconds and DataTable took 18.37320156 and so DataReader is ~1.04 seconds faster than DataTable for 1000 reads.

So, one would get a performance gain of 0.00104 seconds if DataReader is preferred over DataTable.

Look at Is DataSet slower than DataReader due to…? as well

Community
  • 1
  • 1
bjan
  • 2,000
  • 7
  • 32
  • 64
1

What I usually do is open the reader with CommandBehavior.CloseConnection. Then I run through the reader and read the data into my own object model or list or whatever in memory with the data and then close the reader. It does much of the same stuff as a data table, but I just hate dealing with bloated and loosely-typed data stuctures.

Mike Mooney
  • 11,729
  • 3
  • 36
  • 42
0

Straight from the documentation:

When you need to retrieve multiple rows of data so that you can display or process the data in some other way, you have two basic choices. You can use a DataSet object or a DataReader object.

The DataReader approach is generally quicker because it avoids the overhead that is associated with creating a DataSet object. The overhead that is associated with a DataSet object includes creating DataSet subobjects such as DataTables, DataRows, and DataColumns. However, the DataReader provides less flexibility, and is less suited to situations where you have to cache data and pass the data to components in an application that has multiple tiers.

Note: The DataAdapter used to fill the DataSet uses a DataReader internally.

Use a DataReader when the following conditions are true:

• You need forward-only, read-only access to data (the fire hose scenario), and you want to access the data as quickly as possible, and you do not need to cache it.

• You have a data container such as a business component that you can put the data in.

Use a DataSet when the following conditions are true:

• You have to cache or pass the data between layers.

• You require an in-memory relational view of the data for XML or non-XML manipulation.

• You want to update some or all the retrieved rows, and you want to use the batch update facilities of the SqlDataAdapter class.

• You have to bind data to a control type that the DataReader cannot be bound to. Many Windows Forms controls capable of data binding require a data source that implements the IList interface. The DataSet implements IList, but the DataReader implements IEnumerable. IEnumerable supports data binding to most Web Form controls but not to certain Windows Forms controls. Check the data source requirements for the particular control type that you want to bind.

• You have to access multiple sets of data at the same time, and you do not want to hold open server resources.

Though talking about DataSet basically most of it holds for DataTable too. From an efficiency standpoint, here is rare benchmarking from msdn itself. The bottom line is DataReader is marginally faster, and if it matters..

See this related question too which suggests some cool ORMs and benchmarking.

Community
  • 1
  • 1
nawfal
  • 70,104
  • 56
  • 326
  • 368
0

This is the same as I posted here.

I did some benchmarking myself with various approaches:

public DataTable Read1(string query)
{
    using (var cmd = conn.CreateCommand())
    {
        cmd.CommandText = query;
        cmd.Connection.Open();
        var table = new DataTable();
        using (var r = cmd.ExecuteReader())
            table.Load(r);
        return table;
    }
}

public DataTable Read2<S>(string query) where S : IDbDataAdapter, IDisposable, new()
{
    using (var da = new S())
    {
        using (da.SelectCommand = conn.CreateCommand())
        {
            da.SelectCommand.CommandText = query;
            da.SelectCommand.Connection.Open();
            DataSet ds = new DataSet();
            da.Fill(ds);
            return ds.Tables[0];
        }
    }
}

public IEnumerable<S> Read3<S>(string query, Func<IDataRecord, S> selector)
{
    using (var cmd = conn.CreateCommand())
    {
        cmd.CommandText = query;
        cmd.Connection.Open();
        using (var r = cmd.ExecuteReader())
            while (r.Read())
                yield return selector(r);
    }
}

public S[] Read4<S>(string query, Func<IDataRecord, S> selector)
{
    using (var cmd = conn.CreateCommand())
    {
        cmd.CommandText = query;
        cmd.Connection.Open();
        using (var r = cmd.ExecuteReader())
            return ((DbDataReader)r).Cast<IDataRecord>().Select(selector).ToArray();
    }
}

public List<S> Read5<S>(string query, Func<IDataRecord, S> selector)
{
    using (var cmd = conn.CreateCommand())
    {
        cmd.CommandText = query;
        cmd.Connection.Open(); 
        using (var r = cmd.ExecuteReader())
        {
            var items = new List<S>();
            while (r.Read())
                items.Add(selector(r));
            return items;
        }
    }
}

1 and 2 returns DataTable while the rest strongly typed result set, so its exactly not apples to apples, but I while time them accordingly.

Just the essentials:

Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
    Read1(query); // ~8900 - 9200ms

    Read1(query).Rows.Cast<DataRow>().Select(selector).ToArray(); // ~9000 - 9400ms

    Read2<MySqlDataAdapter>(query); // ~1750 - 2000ms

    Read2<MySqlDataAdapter>(query).Rows.Cast<DataRow>().Select(selector).ToArray(); // ~1850 - 2000ms

    Read3(query, selector).ToArray(); // ~1550 - 1750ms

    Read4(query, selector); // ~1550 - 1700ms

    Read5(query, selector); // ~1550 - 1650ms
}

sw.Stop();
MessageBox.Show(sw.Elapsed.TotalMilliseconds.ToString());

The query returned about 1200 rows and 5 fields (run for 100 times). Apart from Read1 all performed well. Of all I prefer Read3 which returns data lazily, as enumerated. This is great for memory if you only need to enumerate it. To have a copy of the collection in memory, you're better off with Read4 or Read5 as you please.

Community
  • 1
  • 1
nawfal
  • 70,104
  • 56
  • 326
  • 368
0

If you want to completely abstract the connections and ceremony of ADO.NET, the DataReader is a small challenge. I really don't like my data tool having an open connection on the loose, hoping the DataReader is disposed (assuming you used option CommandBehavior.CloseConnection). Also, when using a lot of DataReaders, it's difficult to pool connections, since you can't do anything with the connection until the previous DataReader is closed. They can't be passed around easily. Your Data-tool is not a true abstraction.

DataTables on the other hand are extremely flexible and can make for some very efficient, clear code. Linq-To-DataTable is great. Luckily the DataTable is actually pretty efficient. For non-huge result-sets it is almost as fast as the datareader. (it depends of course on exactly what you're doing.) More and more I am getting DataTables from my data-tool rather than Readers. It really makes life simple. I can continue to use the same open connection. There's no "state" in data-tool.

The code for getting a DataReader is very simple. So when I really do need a DataReader (not often), I just let my DAL hand me my connection and I get my DataReader myself.

Patrick Karcher
  • 22,995
  • 5
  • 52
  • 66
  • 2
    DataTables are a lazy way of returning data and lead to unmaintainable code. You should use Readers and return objects. – Keith Rousseau Mar 10 '10 at 19:21
  • 1
    @Keith Rousseau, it looks like we're going to disagree. For instantiating one or a small amount of objects, datatables give simplicity and flexibility. Maybe you'll want your data as XML? From DataTable, no problem. The efficiency your get from DataReaders is often small and not worth it. – Patrick Karcher Mar 10 '10 at 19:32
  • I can come up with a lot of examples. With DataRows you can have fantastic flexibility in constructing your objects; you can construct a ton of objects from one db-query and a single LINQ statement. The efficiency your get from DataReaders is often small and not worth it. If by *lazy* you mean constructing many objects with a **small** amount of code that's very clear and with no possibility of junior programmers leaving connections hanging open, then yes, I am lazy. I want it done quickly, clearly, and with no problems. – Patrick Karcher Mar 10 '10 at 19:41
  • 1
    My issue with DataTables has nothing to do with efficiency. It is about code maintainability and the use of magic strings. You lose strong typing, which you get from returning objects from the Data Access Layer. This makes for code that is much harder to maintain, extend and debug. – Keith Rousseau Mar 10 '10 at 21:11
  • 1
    I don't quite understand where you're coming from. Of course you *work with* strongly typed objects in your business logic and UI. My ADO.NET work serves only to put data into objects, and it sounds like you're the same way. But at some point, either in factory method or inside your objects or wherever, you are getting data from the db. This question was about, if you use a tool to abstract out your ASP.NET work, whether to go with DataReaders or DataTables. When I'm shovelling data into my objects for higher tiers to work with, I'm finding DataTables are more often better. – Patrick Karcher Mar 10 '10 at 21:51
  • How about we compromise; return a data table and create your strongly typed objects from the data tables? lol, this way the connection will be closed immediately and there are no "magic strings." – Justin T. Watts Feb 05 '14 at 22:12
0

When I researched this before, I believe I discovered that the performance difference between DataReader and DataTable was trivial, except for perhaps very large amounts of data. Since then, I have typically used DataTable as it's more full-featured, can be worked with disconnected, etc.

alchemical
  • 13,559
  • 23
  • 83
  • 110