3

I have a table with 300 million rows in Microsoft SQL Server 2008 R2. There is a clustered index on the date column [DataDate] which means that the entire table is ordered by the date column.

How do I stream out the data from this table, into my .NET application, in 10000 row chunks?

Environment:

  • Using C#.
  • Have to be able to pause the data stream at any point, to allow the client to process the rows.
  • Unfortunately, cannot use a select * from as this will select the entire table (its 50GB - it won't fit into memory).
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Contango
  • 76,540
  • 58
  • 260
  • 305
  • "select * from" has no effect on the number of rows, only the columns. Your WHERE condition will determine how many rows, and if you needed to you could logically partition the table with multiple select statements. – Andrew Lewis Jun 27 '11 at 00:18
  • Why do you need to read 300 million rows? This is nonsense or poor design. If your client needs to stop to process rows, then you know the conditions to restrict data. if the condition is complex (eg RegEx) then there are other ways to do this in SQL Server. – gbn Jun 27 '11 at 04:35
  • Isn't this just a rehash of your [previous question](http://stackoverflow.com/questions/6468925/using-cursors-to-read-time-series-data-from-sql-server-using-c)? – Damien_The_Unbeliever Jun 27 '11 at 07:15
  • @gbn For the record, I'm reading time series data which is then fed into Microsoft StreamInsight. I wouldn't necessarily need to read the entire 300 million rows every time I do a test. Normally, I would test on 4 months of data, and occasionally, I might test on 10 years of data. – Contango Jul 06 '11 at 22:37

2 Answers2

8

You can run SELECT * FROM, then loop through the results in a SqlDataReader.

This will never load more than one row into memory at a time; it will load each row as you call Read().

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • @SLaks, just went through some of the questions you've answered - amazing, I like the one where you found the bug in the .NET framework. – Contango Jun 27 '11 at 00:07
  • It's still 300 million operations. Why? – gbn Jun 27 '11 at 06:53
  • @SLaks: You did that last time I challenged you too. You are still advocating reading 300 million rows. Why? – gbn Jun 27 '11 at 12:17
  • 2
    @gbn: Because we don't have enough information to advocate anything better. – SLaks Jun 27 '11 at 12:20
  • @SLaks: I asked OP in a comment for more info. OP asked a similar question as mentioned by Damien_The_Unbeliever. This answer adds no value. – gbn Jun 27 '11 at 12:22
  • @gbn, I've answered your question, above. – Contango Jul 06 '11 at 22:38
  • @gbn, sometimes a customers demand list 'We want an export of all data'. Yes, you can try talking them out of it, but some people are stubborn. – ErikHeemskerk Sep 19 '12 at 14:30
0

You can use a combination of the following with LINQ to SQL:

Enumerable.Take function returns a specified number of contiguous elements from the start of a sequence.

Enumerable.Skip bypasses a specified number of elements in a sequence and then returns the remaining elements.


Refer to:

Accessing Rows In A LINQ Result Without A Foreach Loop?

The LINQ Enumerable Class, Part 2 - Positioning within Sequences

LINQ Enabled Paging

Community
  • 1
  • 1
Akram Shahda
  • 14,655
  • 4
  • 45
  • 65
  • I think you edited the question to make it much better. I've voted you up again. – Contango Jul 06 '11 at 22:35
  • 1
    Downvoted, because you failed to take into account the fact that, unless you use no predicate whatsoever--and maybe even then, queries using `Take` and `Skip` are going to get progressively slower because the RDBMS needs to time to seek within the data set. – ErikHeemskerk Sep 19 '12 at 14:29