LINQ Skip() Problem

Question

The C# statement below blocks the process and cannot retrieve data if itemToSkip is greater than 0.

 int itemToSkip = 100;
 int itemToTake = 1000;

 var itemList = db.MYTABLEs.Skip(itemToSkip).Take(itemToTake).ToList();

How can I fix it? what is the problem?

The table contains 14 gb data. So I can't get them all. I have to skip and take — Ahmet Altun, Mar 10 '11 at 15:21
I am using Sql Server 2008. And Provider: System.Data.SqlClient — Ahmet Altun, Mar 10 '11 at 15:23
@Ahmet: what about executing the equivalent SQL directly? How long does that take? — Jon, Mar 10 '11 at 15:25
Are you using LINQ to SQL or Entity Framework? Which version of .NET are you using? .NET 3.5 or .NET 4.0? — Steven, Mar 10 '11 at 15:28
Check the generated SQL... it's probably using RowNumber() if it's SQL Server, and that's far from efficient on large datasets (thus, the blocking... if you leave it enough time, it'll probably timeout). You should use compiled queries for such a large dataset. — Jcl, Mar 10 '11 at 15:30
Maybe you can use SqlProfiler to see what sql statement is executed. — Cheng Chen, Mar 10 '11 at 15:30
Of what type is `MYTABLEs` exactly. Is it something that implements `IQueryable` or does it just implement `IEnumerable`? — Steven, Mar 10 '11 at 15:30
@Peter: Without the take, it will just return 10,999,9990 records :-). — Steven, Mar 10 '11 at 15:33
I had this problem, and after struggling with either running out of memory or having pages load more and more slowly, I found a much better solution that streams through all the data without memory pressure. It involves turning off the `ObjectTrackingEnabled` property in your `DataContext`: http://stackoverflow.com/questions/12468219/read-huge-table-with-linq-to-sql-running-out-of-memory-vs-slow-paging/ — Andrew Mao, Sep 19 '12 at 19:31

Stephen Chung · Accepted Answer · 2011-03-11T03:04:48.247

Not sure what provider you have that provides db.MYTABLEs. It is really not possible to answer your question unless we know how db.MYTABLEs behaves.

In normal LINQ, skip does not just skip ahead; it has to iterate through the amount of data in order to skip. Therefore, for your 14gb data table, it will be iterating through the first "skip" number of records. If this iteration is slow, you are not saving any cpu/time by skipping.

For some providers, e.g. an SQL source, skip may be implemented using cursors, which can again be slow. If it is SQL Server, it may be optimized with a keyword which may be faster.

If it is LINQ-to-SQL, it translates the query into SQL using a "NOT EXISTS" clause, which will be extremely slow because it has to go through the entire table if the NOT EXISTS clause does not hit an index. See the following (link):

LINQ to SQL translates Skip by using a subquery with the SQL NOT EXISTS clause. This translation has the following limitations:

The argument must be a set. Multisets are not supported, even if ordered.

The generated query can be much more complex than the query generated for the base query on which Skip is applied. This complexity can cause decrease in performance or even a time-out.

In other words, the docs says "don't do it."

Only for providers with random-access features, e.g. an in-memory array, will skip be really fast because the provider can just jump ahead.

The worst case will be if you are running on a provider that automatically sorts the entire data set if you use Skip/Take. If you have 14gb of data, then this sort is going to be really slow.

You need to experiment some more to see if your program is hanging on skip, or just hogging all the cpu trying to iterate through.

If you are only trying to divide your data into manageable chunks, you probably should not be using skip/take, which requeries the data source every time.

score 3 · Answer 2 · answered Mar 10 '11 at 15:20

3

Skip usually insists on having an explicit sort ordering. Try

var itemList = db.MYTABLEs.OrderBy(r => r.Id).Skip(itemToSkip)

or similar.

answered Mar 10 '11 at 15:20

Rup

33,765
9
83
112

@Rup I don't think that is the case – msarchet Mar 10 '11 at 15:22
Ordering 11 million rows is quite difficult :-( – Ahmet Altun Mar 10 '11 at 15:23
1

@Rup: why on earth would it need an ordering?? – Jon Mar 10 '11 at 15:23
@Jon because it's a database query: the skip and take are performed on the database, and I know the LINQ-to-Entities provider (at least) does insist on this. – Rup Mar 10 '11 at 15:25
@Ahmet but you're not doing this part in memory, it's done at the database. If you ask it to order by the clustered key that should be a no-op as far as the database is concerned. – Rup Mar 10 '11 at 15:26
1

@Ahmet: Ordering 11 milion rows is easy when you've got an database index on that column. – Steven Mar 10 '11 at 15:29
@Jon conceptually you need ordering because rows in RDBMS doesn't have certain predefined order. RDBMS is allowed to return rows in different order for same query. – Andrey Mar 10 '11 at 15:33
@Andrey: That would equally apply to any SELECT query. Rup said that `Skip` specifically insists on having an ordering. – Jon Mar 10 '11 at 15:37
@Jon OK, the LINQ-to-SQL engine will error out when building a SELECT query including a Skip if the Skip input is unordered data. No, the Skip by itself won't error out until you trigger the SELECT (e.g. as the `ToList` in Ahmet's code will). – Rup Mar 10 '11 at 15:52
Skip results are well defined only for ordered sets. For unordered sets, the results are not well defined (which means just skipping random rows) -- which is different from saying there will be no results. – Stephen Chung Mar 11 '11 at 02:57

score 0 · Answer 3 · answered Mar 10 '11 at 15:24

0

I assume, that your DBMS isn't supporting skip directly and therefore it might request all data, depending on the provider.

answered Mar 10 '11 at 15:24

Daniel Hilgarth

171,043
40
335
443

1

Do you think Sql Server 2008 & System.Data.SqlClient provider and .NET 3.5 Framework may have this problem? – Ahmet Altun Mar 10 '11 at 15:26
He's talking about the LINQ provider, not the ADO.NET provider. So you are using ADO.NET Entity Framework on .NET 3.5 on a MS SQL database than? – Steven Mar 10 '11 at 15:32
@Ahmet Altun: Yes, I think so. MSSQL Server has a method named ROW_NUMBER() which numbers the returned rows starting with one. So if you want to return rows 100000 to 100010, you need to first request all 100010 rows. See for example here: http://stackoverflow.com/questions/187998/row-offset-in-ms-sql-server – Daniel Hilgarth Mar 10 '11 at 15:32

score 0 · Answer 4 · answered Mar 10 '11 at 15:30

0

with such large table is possible that is just taking a long time to return, and it's taking that long due to lack of indexes, you need to do some profiling to see the underlying query, you can't diagnose the problem just by looking at the LINQ code

answered Mar 10 '11 at 15:30

BlackTigerX

6,006
7
38
48

LINQ Skip() Problem

4 Answers4

Linked