10

We have a lot of data layer code that follows this very general pattern:

public DataTable GetSomeData(string filter)
{
    string sql = "SELECT * FROM [SomeTable] WHERE SomeColumn= @Filter";

    DataTable result = new DataTable();
    using (SqlConnection cn = new SqlConnection(GetConnectionString()))
    using (SqlCommand cmd = new SqlCommand(sql, cn))
    {
        cmd.Parameters.Add("@Filter", SqlDbType.NVarChar, 255).Value = filter;

        result.Load(cmd.ExecuteReader());
    }
    return result;
}

I think we can do a little better. My main complaint right now is that it forces all the records to be loaded into memory, even for large sets. I'd like to be able to take advantage of a DataReader's ability to only keep one record in ram at a time, but if I return the DataReader directly the connection is cut off when leaving the using block.

How can I improve this to allow returning one row at a time?

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • But isn't loading all the records into memory generally better than keeping a connection open to the database with a DataReader? – codeulike May 11 '09 at 21:07
  • It depends. For a winforms app, yes. For a web app where memory is scarce and queries need to finish quickly anyway, probably not. – Joel Coehoorn May 11 '09 at 21:12
  • Can you specify more closely what kind of "real advantage of DataReader" you would like to have? – Theo Lenndorff May 11 '09 at 21:14
  • So you want to use the DataReader because it returns the first result quicker? – codeulike May 11 '09 at 21:15
  • Yes, and saves ram on the server. This is predominately code for intranet apps. – Joel Coehoorn May 11 '09 at 21:18
  • 2
    rather than focusing on an "elegant" way to utilize the datareader, I'd spend more time with the sql query itself. Why are you returning so many rows that memory consumption becomes an issue? Looking at your implementation makes me think that there are a bunch of rows returned from the query that don't ever get used. – Al W May 11 '09 at 21:25
  • On any given project, the sql will be 1000x more important. But I might use this same plumbing code on many many projects, so I think it's worthwhile to spend some time with this. – Joel Coehoorn Jul 31 '15 at 15:32

4 Answers4

13

Once again, the act of composing my thoughts for the question reveals the answer. Specifically, the last sentence where I wrote "one row at a time". I realized I don't really care that it's a datareader, as long as I can enumerate it row by row. That lead me to this:

public IEnumerable<IDataRecord> GetSomeData(string filter)
{
    string sql = "SELECT * FROM [SomeTable] WHERE SomeColumn= @Filter";

    using (SqlConnection cn = new SqlConnection(GetConnectionString()))
    using (SqlCommand cmd = new SqlCommand(sql, cn))
    {
        cmd.Parameters.Add("@Filter", SqlDbType.NVarChar, 255).Value = filter;
        cn.Open();

        using (IDataReader rdr = cmd.ExecuteReader())
        {
            while (rdr.Read())
            {
                yield return (IDataRecord)rdr;
            }
        }
    }
}

This will work even better once we move to 3.5 and can start using other linq operators on the results, and I like it because it sets us up to start thinking in terms of a "pipeline" between each layer for queries that return a lot of results.

The down-side is that it will be awkward for readers holding more than one result set, but that is exceedingly rare.

Update
Since I first started playing with this pattern in 2009, I have learned that it's best if I also make it a generic IEnumerable<T> return type and add a Func<IDataRecord, T> parameter to convert the DataReader state to business objects in the loop. Otherwise, there can be issues with the lazy iteration, such that you see the last object in the query every time.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • I am using a similar implementation for .NET 3.5. It works fine, but for some reason I feel some misuse of the iterator pattern, but that very subjective. It really get's messy when you want to wrap some exception handling preventing this block from exiting because the cast fails for some reason. ;-) – Theo Lenndorff May 11 '09 at 21:21
  • 1
    @RunningMonkey: since you've run this pattern live, have you seen the (IDataRecord) cast fail often? How worried about that should I be? – Joel Coehoorn May 11 '09 at 21:23
  • Maybe not much, because my implementation is more similar to http://stackoverflow.com/questions/47521/using-yield-to-iterate-over-a-datareader-might-not-close-the-connection. There I convert to base types with Convert.ChangeType, before doing a yield return. This almost cried for troubles. :-) – Theo Lenndorff May 11 '09 at 21:39
  • To be honest: That cast to IDataRecord has much more appeal to me. – Theo Lenndorff May 11 '09 at 21:41
  • What happens if the calling code is doing something weird with the returned IDataRecord and throws an exception? That iterator would instantly end and exit all using blocks, wouldn't it? – Theo Lenndorff May 11 '09 at 21:45
  • Exceptions are one thing, I can also see that the reader loop is now subject to execute slower since it depends on the speed of the enumerating code to move through the reader. Perhaps a bit speculative, but what happens when you pass this enumerator up to the presentation layer, an aspx page that wants to do some data binding on the collection. Instead of doing a fast scan-and-extract you're now doing a much slower databind operation. – Peter Lillevold May 11 '09 at 21:57
  • 3
    This solution will hold your connection open until you've iterated the entire resultset, compared to the original code which fills the DataTable using a firehose cursor and closes the connection asap. In effect you're trading one resource (memory) for another (database connections), not sure if this is an issue for you or not. – LukeH May 11 '09 at 22:00
  • You dont need the cast though.. Just `yield return rdr` would do since `rdr` is already `IDataRecord` (strangely). This is one area I believe MS has got inheritance wrong (should have been composition). – nawfal Feb 09 '13 at 22:49
  • Are you sure `yield return` from `while` works? Not for me. I made it a question here: http://stackoverflow.com/questions/31747123/while-on-idatareader-read-doesnt-work-with-yield-return-but-foreach-on-reader-does – nawfal Jul 31 '15 at 13:21
  • @nawfal If won't work if you do something like call `.ToList()`, because you're yielding the same object on each iteration. At the end of the `.ToList()` call, each entry in the list will be the same object with the same value, and moreover the connection will close the datareader disposed. It **will** work if you act on each record inside of a `foreach` loop, use it with other IEnumearble extensions like `Select()` or `Where()`, or use it as a datasource to bind to a grid control. Addressing the `.ToList()` problem is the reason for the `Update` at the bottom of the answer. – Joel Coehoorn Jul 31 '15 at 15:27
  • @JoelCoehoorn I do not get you tbh. Can you make it an answer for my question with little elaboration. As such I'm not sure what is the difference between doing ToList() and foreach. – nawfal Jul 31 '15 at 15:31
  • @JoelCoehoorn can you please explain the _"make it a generic IEnumerable return type and add a Func parameter"_ part a bit more in detail. Where must the Func<> paramerter be added and why does it help prevent to see the last object in the query every time? – soulflyman Oct 23 '18 at 15:35
3

In times like these I find that lambdas can be of great use. Consider this, instead of the data layer giving us the data, let us give the data layer our data processing method:

public void GetSomeData(string filter, Action<IDataReader> processor)
{
    ...

    using (IDataReader reader = cmd.ExecuteReader())
    {
        processor(reader);
    }
}

Then the business layer would call it:

GetSomeData("my filter", (IDataReader reader) => 
    {
        while (reader.Read())
        {
            ...
        }
    });
Peter Lillevold
  • 33,668
  • 7
  • 97
  • 131
  • .Net 2.0 for now, but I'll keep that in mind for the next refresh. – Joel Coehoorn May 11 '09 at 21:24
  • Actually, I don't think this will work because I want these to be passed up the chain towards the presentation tier. I could still doing that by repeating this pattern inside the lambda, but that feels ugly. – Joel Coehoorn May 11 '09 at 21:33
  • I agree. Then again, I tend to want to work through and close the data reader as quickly as possible, and rather build up the required data at the data layer, leaving it up to the layers above to dictate how those data should look like. With the Action processor, the business layer gets to dictate the format while keeping the reader loop as tight as possible (assuming that the business layer doesn't do something time consuming in the processor method, that is). – Peter Lillevold May 11 '09 at 21:49
2

The key is yield keyword.

Similar to Joel's original answer, little more fleshed out:

public IEnumerable<S> Get<S>(string query, Action<IDbCommand> parameterizer, 
                             Func<IDataRecord, S> selector)
{
    using (var conn = new T()) //your connection object
    {
        using (var cmd = conn.CreateCommand())
        {
            if (parameterizer != null)
                parameterizer(cmd);
            cmd.CommandText = query;
            cmd.Connection.ConnectionString = _connectionString;
            cmd.Connection.Open();
            using (var r = cmd.ExecuteReader())
                while (r.Read())
                    yield return selector(r);
        }
    }
}

And I have this extension method:

public static void Parameterize(this IDbCommand command, string name, object value)
{
    var parameter = command.CreateParameter();
    parameter.ParameterName = name;
    parameter.Value = value;
    command.Parameters.Add(parameter);
}

So I call:

foreach(var user in Get(query, cmd => cmd.Parameterize("saved", 1), userSelector))
{

}

This is fully generic, fits any model that comply to ado.net interfaces. The connection and reader objects are disposed after the collection is enumerated. Anyway filling a DataTable using IDataAdapter's Fill method can be faster than DataTable.Load

Community
  • 1
  • 1
nawfal
  • 70,104
  • 56
  • 326
  • 368
  • I'm a bit confused by your note that "The connection object and reader is disposed only after the collection is enumerated once." Everything I'm reading, including the SO question linked by your sentence, suggests that even if the collection is not fully enumerated (such as by breaking out of a `foreach` that is using it), it will be disposed. Are you saying that it's not, or does breaking out of the iteration count as the collection being "enumerated once?" Thanks for fleshing the Func version out, by the way. – bubbleking Jan 09 '15 at 17:43
  • 1
    @bubbleking I made a misleading statement there. I was trying to convey the point that the connection and reader objects will be disposed after you enumerate the list - if at all you have to enumerate - regardless of you break out of the loop or not. If you dont enumerate there is no question of connection object being used ever. I will edit the answer. Thanks for pointing out. – nawfal Jan 09 '15 at 20:44
0

I was never a big fan of having the data layer return a generic data object, since that pretty much dissolves the whole point of having the code seperated into its own layer (how can you switch out data layers if the interface isn't defined?).

I think your best bet is for all functions like this to return a list of custom objects you create yourself, and in your data later, you call your procedure/query into a datareader, and iterate through that creating the list.

This will make it easier to deal with in general (despite the initial time to create the custom classes), makes it easier to handle your connection (since you won't be returning any objects associated with it), and should be quicker. The only downside is everything will be loaded into memory like you mentioned, but I wouldn't think this would be a cause of concern (if it was, I would think the query would need to be adjusted).

John
  • 17,163
  • 16
  • 65
  • 83
  • 1
    We have a 4-tier architecture. The data layer returns _generic_ data objects (DataTable, DataRow, DataReader) to a translation layer that converts them into strongly-typed business objects. Minimizes the damage. – Joel Coehoorn May 11 '09 at 21:21
  • I see. I suppose the only downside to that would be doing the looping in both layers to get the list created, but I would think that impact would be minimal. – John May 11 '09 at 21:25
  • 1
    The great thing about IEnumerable is that you still only loop through the data once. So yes, the the impact should be non-existent. – Joel Coehoorn May 11 '09 at 21:28