2

I have a table that is rather large at about 10,000,000 rows. I need to page through this table from my C# application. I'm using NHibernate. I have tried to use this code example:

return session.CreateCriteria(typeof(T))
                .SetFirstResult(startId)
                .SetMaxResults(pageSize)
                .List<T>();

When I execute it the operation eventually times out if my startId is greater than 7,000,000. The pageSize I'm using is 200. I have used this method on much smaller tables, of less than 1000 rows, and it works and performs quickly.

Question is, on such a large table is there a better way to accomplish this using NHibernate?

Dave
  • 275
  • 2
  • 14
  • Which rdbms? or what does the SQL look like? Maybe try throwing an OrderBy in there... – dotjoe Feb 15 '12 at 22:33
  • It's MSSQL2005. Trying the order now. – Dave Feb 15 '12 at 22:43
  • The order didn't actually make any difference, but I did find that changing the startId to a lower value allowed the operation to complete. It seems to choke with a startId value of about 7,000,000 or higher. Anything lower is slow, but at least finishes. – Dave Feb 15 '12 at 22:55
  • Can you fire up SQL Server Profiler and see which query gets fired? And then fire it manually from studio. This is not necessarily problem with NHibernate. Your NHibernate query may be such that limit can't be applied in SQL, but that result is trimmed later. – Nikola Radosavljević Feb 15 '12 at 23:17
  • You're absolutely right, it is SQL Server that is slow. I pulled the query from SQL Server Profiler and ran it from Management Studio. It took close to a minute. So, my question still stands, is there a more efficient way of doing this with NHibernate? We're trying to get away from stored procedures because they're difficult to manage. – Dave Feb 16 '12 at 15:09

3 Answers3

2

You're trying to page through 10 million rows 200 at a time? Why? No human being is going to page through that much data.

You need to filter the dataset first and then apply TSQL style paging to the smaller data set. Here are some methods that will work. Just modify them so that you're getting to less than 10million rows through some kind of filtering (a WHERE clause, CTE, or derived table).

Grant Fritchey
  • 2,645
  • 19
  • 21
  • Nice link. I've also seen that comment "..why would you want to.." a lot with regards to paging. In many instances, it is simply a business requirement. I am working on a system that imports flat text files (some containing hundreds of thousands of rows). We provide a feature to view the files once they are imported, and I'm not going to load ALL rows for them to see, so paging seems like an ideal solution. It's compounded by the fact that the users want to be able to sort by various columns, so while they may not want to page through the whole file, they need some mechanism to view the data. – Mr Moose Feb 16 '12 at 02:13
  • I've had that requirement before. No one will page through the data. Assuming 1 second per page, it'll take them 138 hours to get through it. What they want is to look at a sample set and see that there are 10 million available. Provide them good indexing and a way to search on the important fields and they should be happy. You're working hard to solve something that just won't ever be used the way they're saying it's needed. I'd drill down on those requirements get them refined. Show them how crazy they are. Explain and demonstrate alternatives. You're the person with knowledge in the room. – Grant Fritchey Feb 16 '12 at 12:11
  • 2
    It's not that we're trying to read through 10M records, but we might only want to see the last few hundred. Maybe you can suggest an alternative strategy for this. – Dave Feb 16 '12 at 15:11
  • Nothing beyond restructuring the queries so your working off of filtered lists instead of trying to page that much data. Filtering and sorting TSQL is really good at. Paging, less so, although, using queries like those in the link above will give you a leg up. – Grant Fritchey Feb 16 '12 at 17:08
  • @GrantFritchey you are right, it would take a long time to page through every row. I also agree with what you say regarding filtering and sorting, but nonetheless, if your filtered and sorted query still returns a huge number of rows, paging will be required. Whether the user chooses to view them all pages or not is up to them. All I'm doing is providing the functionality to do so with the best performance I can. – Mr Moose Feb 17 '12 at 02:04
1

Funny you should bring this up, as I am having the same issue. My issue isn't related to paging using NHibernate, but more with just using straight T-SQL.

It seems as though there are a few options. The one I found quite useful in my instance was this answer to a question regarding paging. It discusses using a "..keyset driven solution" rather than return ranked results through the use of ROW_NUMBER(). I'm not sure what NHibernate would use in this instance or if it's possible to see the SQL it generates based on the query you issue (I know you could in Hibernate, but I've not used NHibernate).

If you aren't aware of the using SQL SERVER to returned ranked results based on ROW_NUMBER, then it's well worth looking into. A lot of people seem to refer to this article as to how to go about paging. I've seen some subsequent posts discourage the use of SET ROWCOUNT though in favour of using TOP with a dynamic parameter - SELECT TOP(@NumOfResults).

There are lots of posts here on SO regarding this, but no definitive answer on the best way to go about it as far as I can see. I'll be keeping an eye on this post to see what others suggest also.

Community
  • 1
  • 1
Mr Moose
  • 5,946
  • 7
  • 34
  • 69
1

It could by Isolation Layer problem.

I had a similar issues. If the table your reading from is constantly updated, the updater locks parts of the table, causing timeout then reading from the table. Add SetIsolationLayer(ReadUncommitted) you must note that the data might be a little dirty.

Mr Mush
  • 1,538
  • 3
  • 25
  • 38