1

I'd like to write a function, that retrieves data from a database table page by page. The goal here is to save memory. This is a part of a validation program that we will be running on our database occasionally to make sure that we have consistent data. Tables could be quite big, so I would not want to load the whole table to memory for doing the validation.

With that in mind I wrote this function:

static IEnumerable<T> RetreivePages<T>(IQueryable<T> query, int count, int pageSize)
{
    int pages = count / pageSize;
    if (count % pageSize > 0)
    {
        pages++;
    }
    for (int i = 0; i < pages; i++)
    {
        foreach (T item in query.Skip(i * pageSize).Take(pageSize))
        {
            yield return item;
        }
    }
} 

The idea here is that we are only retrieving pageSize rows at a time, so we won't fill memory with all the rows from a table.

Unfortunately that does not work. query.Skip line throws the following exception:

The method 'Skip' is only supported for sorted input in LINQ to Entities. The method 'OrderBy' must be called before the method 'Skip'.

Is there any other way to accomplish what I want?

Update

The answers to the question linked as duplicate suggest ordering by a column. .OrderBy would not work here because properties on T are not known inside the function.

Andrew Savinykh
  • 25,351
  • 17
  • 103
  • 158
  • 2
    Ordering is important because running the same SELECT on a database more than once an result in different orders of data. – DavidG Jul 31 '15 at 00:50
  • @DavidG I see. Well then, it looks like there is no answer to that question then. Thank you! – Andrew Savinykh Jul 31 '15 at 00:51
  • possible duplicate of [Paging Error :The method 'Skip' is only supported for sorted input in LINQ to Entities. The method 'OrderBy' must be called before the method 'Skip'](http://stackoverflow.com/questions/22802729/paging-error-the-method-skip-is-only-supported-for-sorted-input-in-linq-to-en) – Alex Netkachov Jul 31 '15 at 00:58
  • @AlexAtNet yep. So how would you solve this problem? The solutions in the question you linked do not apply. – Andrew Savinykh Jul 31 '15 at 01:06
  • if you can't sort the result in this method, any chance you can do so prior to calling it? – ESG Jul 31 '15 at 01:10
  • I agree with DavidG's solution - it is nice to have orderBy parameter in such method to guarantee that the rowset will be sorted. The sorting can be applied before calling this method but this will open the way for runtime exceptions. – Alex Netkachov Jul 31 '15 at 01:26

2 Answers2

2

You could pass in an already ordered query into your method and change the input type to IOrderedEnumerable<T> or pass in the selector to order by within your method, something like this:

static IEnumerable<T> RetreivePages<T, U>(
    IQueryable<T> query, 
    Func<T, U> orderBy, //<--- Additional parameter
    int count, int pageSize)
{
    //Apply the ordering
    var orderedQuery = query.OrderBy(orderBy);

    int pages = count / pageSize;
    if (count % pageSize > 0)
    {
        pages++;
    }
    for (int i = 0; i < pages; i++)
    {
        //Use the new ordered version
        foreach (T item in orderedQuery.Skip(i * pageSize).Take(pageSize))
        {
            yield return item;
        }
    }
} 

And call it like this:

var query = ...;

//Assuming your query object have a property called "ID":
var pagedQuery = RetrievePages(query, x => x.ID, 10, 100;
DavidG
  • 113,891
  • 12
  • 217
  • 223
  • I like it. Both solutions are practical. In my case ordering before passing to the function works best (duh!) and the simplest. I came across another approach while searching for the solution which I will post soon for the posterity, but I'll use yours. I'll accept your answer in a few days providing no one comes up with a better idea, which I find doubtful. Thanks! – Andrew Savinykh Jul 31 '15 at 01:40
0

DavidG is right, you have to sort somehow, so let's see what can be done.

This answer provides a nice generic function that can sort by string name:

public static class QueryHelper
{
    public static IQueryable<T> OrderByField<T>(this IQueryable<T> q, string sortField, bool ascending = true)
    {
        var param = Expression.Parameter(typeof(T), "p");
        var prop = Expression.Property(param, sortField);
        var exp = Expression.Lambda(prop, param);
        string method = ascending ? "OrderBy" : "OrderByDescending";
        Type[] types = { q.ElementType, exp.Body.Type };
        var mce = Expression.Call(typeof(Queryable), method, types, q.Expression, exp);
        return q.Provider.CreateQuery<T>(mce);
    }

}

Now what field are you sorting on? More often than not you will have a primary key on your tables. I'm assuming you do. In this case you can retrieve the keys and sort on them. You code then will become:

public class DbHelper<T> where T : class
{
    private readonly string[] _keyNames;

    public DbHelper(DbContext context)
    {
        ObjectSet<T> objectSet = ((IObjectContextAdapter)context).ObjectContext.CreateObjectSet<T>();
        _keyNames = objectSet.EntitySet.ElementType.KeyMembers.Select(k => k.Name).ToArray();

    }
    public IEnumerable<T> RetreivePages(IQueryable<T> query, int count, int pageSize)
    {
        int pages = count / pageSize;
        if (count % pageSize > 0)
        {
            pages++;
        }
        for (int i = 0; i < pages; i++)
        {
            IQueryable<T> queryToRun = _keyNames.Aggregate(query, (current, keyName) => current.OrderByField(keyName));
            foreach (T item in queryToRun.Skip(i * pageSize).Take(pageSize))
            {
                yield return item;
            }
        }
    }
}

Now there are quite a few caveats with this approach. Getting the keys is quite expensive, so you definitely do not want to create multiple instances of the DbHelper for the same type parameter value. Also dynamically constructing a query like this is slower than manual ordering.

So I would recommend using David's solution over mine, (which is frankly as simple that it should be obvious) but I still wanted to document it here in case the alternate is useful in a different scenario.

Community
  • 1
  • 1
Andrew Savinykh
  • 25,351
  • 17
  • 103
  • 158