Selecting id's from a huge database

Question

I have a database with over 3,000,000 rows, each has an id and xml field with varchar(6000).

If I do SELECT id FROM bigtable it takes +- 2 minutes to complete. Is there any way to get this in 30 seconds?

3.000.000 rows is not many (and it's certainly not huge). Create appropriate indexes. — Mitch Wheat, Mar 15 '13 at 10:25
Where are you retrieving the numbers to? It does take time to print text on the screen and transmit them over the netwrok? The engine will be able to iterate the `Id` column significantly faster than the numbers can be transfered to and displayed on the client. Even is it is not a clutered index, which it obviously is, or should be. — Jodrell, Mar 15 '13 at 10:29

score 3 · Answer 1 · answered Mar 15 '13 at 10:25

3

Build clustered index on id column

See http://msdn.microsoft.com/en-us/library/ms186342.aspx

answered Mar 15 '13 at 10:25

Alex

8,827
3
42
58

@Saju Man, I was going to suggest two answers on linq topic today, but not enought fast :) Thanks – Alex Mar 15 '13 at 10:28
I think you are missing the point, this is not a performance problem caused by a lack of index. – Jodrell Mar 15 '13 at 10:37
@Jodrell Why do you think it's not a point? A column without index with 3000000 records is a piece of works for SQL engine, even for the numeric field – Alex Mar 15 '13 at 10:42
1

@voo, if you are retruning them all a table scan is hardly different to an index scan. – Jodrell Mar 15 '13 at 10:46
@Jodrell What do you mean - if you return all. SELECT id FROM bigtable he selects by id, so index will be used be engine. Try to simulate such table, fill it with something and build the maintance plan. Im sure the you will see Index scan on the first step. – Alex Mar 15 '13 at 10:56
I'm referring to the "Actual Execution Plan" that will be generated by the query engine to satisfy the query. If there is no index to help it will perform a table scan, so will incur the cost of jumping the xml field etc. If there is a clustered index, it would satisfy the query with an index scan. Both of these operations will be relatively fast. All the delay is involved with getting the results to the client. – Jodrell Mar 15 '13 at 11:13
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/26235/discussion-between-voo-and-jodrell) – Alex Mar 15 '13 at 11:29
@Jodrell I don't usndestand, why you think, that table scan will cost the same time as index scan. In table scan it would make the scan of the whole 3000000 id records (how xml column influences that?), while in index blocks it will scan 1 index block. He selects 1 record, with the index the scan will be much faster then without it. – Alex Mar 15 '13 at 11:35

Darren · Answer 2 · 2013-03-15T10:31:18.143

2

You could apply indexes to your tables. In your case a clustered index.

Clustered indexes:

http://msdn.microsoft.com/en-gb/library/aa933131(v=sql.80).aspx

I would also suggest filtering your query so it doesn't return all 3 million rows each time, this can be done by using TOP or WHERE.

TOP:

SELECT TOP 1000 ID
FROM bigtable

WHERE:

SELECT ID FROM
bigtable
WHERE id IN (1,2,3,4,5)

edited Mar 15 '13 at 10:31

answered Mar 15 '13 at 10:25

Darren

68,902
24
138
144

The `TOP` is fairly meaningless without an `order by` but, paging the data makes a lot of sense. – Jodrell Mar 15 '13 at 11:01

score 0 · Answer 3 · edited May 23 '17 at 12:28

0

First of all, 3 milion records dont make a table 'Huge'.

To optimize your query, you should do the following.

Filter your query, why do you need to get ALL your IDs?

Create clustered index for the ID column to get a smaller lookup table to search first before pointing to the selected row.

Helpful threads, here and here

edited May 23 '17 at 12:28

Community

1
1

answered Mar 15 '13 at 10:29

Muhammad Hani

8,476
5
29
44

Jodrell · Answer 4 · 2013-03-15T10:57:15.957

Okay, why are you retuning all the Ids to the client?

Even if your table has no clustered index (which I doubt), the vast majority of you processing time will be client-side, transferring the Id values over the network and displaying them on the screen.

Querying for all values rather defeats the point of having a query engine.

The only reason I can think of (perhaps I lack imagination) for getting all the Ids is some sort of misguided caching.

If you want to know many you have do

SELECT count(*) FROM [bigtable]

If you want to know if an Id exists do

SELECT count([Id[) FROM [bigtable] WHERE [Id] = 1 /* or some other Id */

This will return 1 row with a 1 or 0 indicating existence of the specified Id.

Both these queries will benefit massively from a clustered index on Id and will return minimal data with maximal information.

Both of these queries will return in less than 30 seconds, and in less than 30 milliseconds if you have a clustered index on Id

Selecting all the Ids will provide no more useful information than these queries and all it will achieve is a workout for you network and client.

score 0 · Answer 5 · answered Mar 15 '13 at 14:58

0

You could index your table for better performance.

There are additional options as well which you could use to imrpove performance like partion feature.

answered Mar 15 '13 at 14:58

FIre Panda

6,537
2
25
38

Selecting id's from a huge database

5 Answers5