Issue with the big tables ( no primary key available)

Question

Tabe1 has around 10 Lack records (1 Million) and does not contain any primary key. Retrieving the data by using SELECT command ( With a specific WHERE condition) is taking large amount of time. Can we reduce the time of retrieval by adding a primary key to the table or do we need to follow any other ways to do the same. Kindly help me.

Can you please post more details - table structure, the query that takes a long time, etc... — Aaron Bertrand, Feb 15 '12 at 06:55

score 4 · Accepted Answer · answered Feb 15 '12 at 07:02

A primary key does not have a direct affect on performance. But indirectly, it does. This is because when you add a primary key to a table, SQL Server creates a unique index (clustered by default) that is used to enforce entity integrity. But you can create your own unique indexes on a table. So, strictly speaking, a primary index does not affect performance, but the index used by the primary key does.

WHEN SHOULD PRIMARY KEY BE USED?

score 3 · Answer 2 · answered Feb 15 '12 at 06:56

3

Primary key is needed for referring to a specific record.

To make your SELECTs run fast you should consider adding an index on an appropriate columns you're using in your WHERE.

E.g. to speed-up SELECT * FROM "Customers" WHERE "State" = 'CA' one should create an index on State column.

answered Feb 15 '12 at 06:56

penartur

9,792
5
39
50

1

While in general I agree that every table *should* have a primary key, they are not the only way to refer to a specific row. Unique constraints can also do that. You can also find specific rows using unique data even if the uniqueness is not enforced via a unique constraint or primary key constraint. So strictly speaking a primary key is NOT needed for referring to a specific row. – Aaron Bertrand Feb 15 '12 at 07:14

Jonathan Leffler · Answer 3 · 2012-02-15T07:27:24.610

1

It depends on the SELECT statement, and the size of each row in the table, the number of rows in the table, and whether you are retrieving all the data in each row or only a small subset of the data (and if a subset, whether the data columns that are needed are all present in a single index), and on whether the rows must be sorted.

If all the columns of all the rows in the table must be returned, then you can't speed things up by adding an index. If, on the other hand, you are only trying to retrieve a tiny fraction of the rows, then providing appropriate indexes on the columns involved in the filter conditions will greatly improve the performance of the query. If you are selecting all, or most, of the rows but only selecting a few of the columns, then if all those columns are present in a single index and there are no conditions on columns not in the index, an index can help.

Without a lot more information, it is hard to be more specific. There are whole books written on the subject, including:

Relational Database Index Design and the Optimizers

edited Feb 15 '12 at 07:27

answered Feb 15 '12 at 07:08

Jonathan Leffler

730,956
141
904
1,278

I don't know if I agree with your second sentence. If all the rows need to be returned but not all of the columns, a non-clustered index on only the required columns can certainly be beneficial and a scan of that index will be more efficient than a scan of the clustered index (or heap). – Aaron Bertrand Feb 15 '12 at 07:16
@Aaron: yes, it quickly gets complex, and I decided not to go into that level of detail. But you're right; if the SELECT is on a small projection over the columns of an index, then an index can save time compared to a full table scan. OTOH, DBMS do full table scans rather fast. I didn't discuss primary keys either. They should always be present. I don't usually count auto-incrementing columns as a PKs; they should normally be a surrogate for some other identifiable subset of the columns that represents a unique (candidate) key too. – Jonathan Leffler Feb 15 '12 at 07:19
To make that less ambiguous, I would probably say, "If all of the *data* in the table must be returned..." Probably just my pedantic side but a narrow index was the first thing that came to mind when I read how you worded it. – Aaron Bertrand Feb 15 '12 at 07:22
@AaronBertrand: Try that revision for size... :D – Jonathan Leffler Feb 15 '12 at 07:27

score 1 · Answer 4 · answered Feb 15 '12 at 07:17

Primarykey will not help if you don't have Primarykey in where cause.

If you would like to make you quesry faster, you can create non-cluster index on columns in where cause. You may want include columns on top of your index(it depend on your select cause)

The SQL optimizer will seek on your indexs that will make your query faster. (but you should think about when data adding in your table. Insert operation might takes time if you create index on many columns.)

score 0 · Answer 5 · edited Feb 15 '12 at 07:04

One way you can do it is to create indexes on your table. It's always better to create a primary key, which creates a unique index that by default will reduce the retrieval time .........

The optimizer chooses an index scan if the index columns are referenced in the SELECT statement and if the optimizer estimates that an index scan will be faster than a table scan. Index files generally are smaller and require less time to read than an entire table, particularly as tables grow larger. In addition, the entire index may not need to be scanned. The predicates that are applied to the index reduce the number of rows to be read from the data pages.

Read more: Advantages of using indexes in database?

Issue with the big tables ( no primary key available)

5 Answers5

Linked