0

I have a huge table with rows in the millions. I need to process all of them in a program.

I am interested in only one column which is studentId. How do I write a query to load up 2000 of those items and then feed it to the program to process and then get the next batch of 2000 and so on?

The problem is how do I keep track of which items I have already processed, the list isn't sorted.

EDIT: The process I need to do is this: I have a table in the form of [studentId, name, DepartmentId]

The number of students here could easily be 30 to 40 million. The various departments that they belong to typically could be in order of thousands...So DepartmentId's could be in the range of 5000 to 6000. The rows could be interspersed. Now I need to get all students belonging to a specific department, and process them in a program. I cannot load a million student ids at once, so i need to do some sort of batching. Thanks.

user1202434
  • 2,243
  • 4
  • 36
  • 53

3 Answers3

0

If that is MYSQL you can do something like this

SELECT studentId from [TABLE_NAME] LIMIT 0,2000

you just need to loop around your code or SP to and change the LIMIT numbers.

AbcAeffchen
  • 14,400
  • 15
  • 47
  • 66
0

If we run our select multiple times, SQL Server doesn't guarantee to have the same order in the results.

So you have two choices:

  1. Order the table and use OFFSET FETCH clause. ( or use TOP clause, but TOP isn’t a standard feature). You can use the ROW_NUMBER with OVER. But in this case OFFSET is simpler and better for performance.
  2. Or use your ID key in WHERE clause (WHERE BETWEEN).
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Chehre
  • 16
  • 1
0

You can use a SqlDataReader, read each batch, process it, and keep reading, until you finish. Only close the reader when you finish all the batches.

You can change the isolation level if locking is a problem.

Community
  • 1
  • 1
JotaBe
  • 38,030
  • 8
  • 98
  • 117