0

I want to build an SQL Server database with one table that will contain 200.000.000 records.

The table has 2 columns: Id and Value where Id is the PrimaryKey and is indexed.

My question is about the SQL Server performance, how quick I can get a value by primary key?

user2818430
  • 5,853
  • 21
  • 82
  • 148
  • Sounds simple enough - why don't you try to build and see for yourself..? Obviously the spec of the machine the server is running on will make a difference, but it should be pretty fast I would think. – Mat Richardson Feb 29 '16 at 22:25
  • 1
    With a properly equipped server, properly indexed table (i.e., clustered so as to avoid overhead of looking things up in the generated index), you should find records nearly instantly. That said, if your PK is a string or something, you're probably going to be unhappy with the performance. – David T. Macknet Feb 29 '16 at 22:37
  • @DavidT.Macknet: The primary key is string, generated something like this one: A2B3C4D5X – user2818430 Feb 29 '16 at 22:42
  • Ouch. Well, you *could* cluster on it still ... but your performance would be much better with an INT as your primary key, particularly since you're talking millions of records. Also, is this data changing over time? If not, pack those indexes with a fill factor of 100, so there are no empty leaf nodes in the binary tree. – David T. Macknet Feb 29 '16 at 22:45
  • a database with one table with 2,00,000,000 records, with two columns and a alphanumeric key. A relational database may not be the right solution for you. But basically if you have enough RAM to cache the entire table in it your performance will be just fine. – Nick.Mc Feb 29 '16 at 22:57
  • As written in my answer, *read access* time (getting values) is typically not the problem. However, if your cluster key (which can be the primary key, but doesn't have to be) is "random", you'll end up with quite bad insert performance as they will cause much "housekeeping" of the tree data structure to be performed, e.g. page splitting. – Lucero Mar 01 '16 at 00:04

2 Answers2

5

In general very quickly, as SQL Server stored data in trees and it can locate data stored in such a way quickly (I assume by "indexed" you mean that this is the clustered key).

However, there are many other aspects such as hardware (memory, disk speed, etc.) and usage pattern which play a role, so that there is no clear answer to your question.

Lucero
  • 59,176
  • 9
  • 122
  • 152
2

An indexed column and in particular a primary key can be accessed very fast. If you intend to often access records in their sort order, a clustered primary key can improve the access time further. With a clustered index the rows are stored in a physical order that corresponds to the index order.
See: What do Clustered and Non clustered index actually mean?

The records should be inserted sequentially (in relation to the index column(s)) when using a clustered index, otherwise page inserts and fragmentation of the index will occur. Clustered indexes work best with identity columns. If you use GUIDs as index column, use the newsequentialid() function. (According to clarifications of @Lucero)


An other optimization would be to use a covering index. This is an index including all the columns of a query. With a covering index, SQL-Server needs only to access the index. The rows need not to be accessed separately. This reduces the number of disk accesses.
See: Using Covering Indexes to Improve Query Performance

Community
  • 1
  • 1
Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188
  • The primary key is generated in a form of letters and numbers: something like this one: `A2B3C4D5X`. The order doesn't actually matter. I need to get the value of that primary key – user2818430 Feb 29 '16 at 22:40
  • Then probably a clustered index will not improve queries but slow down inserts if the primary keys are inserted in an unorderd way. – Olivier Jacot-Descombes Feb 29 '16 at 22:45
  • 1
    Inserting at random positions will cause page inserts and splits (that is, fragmentation), it will however not physically move around many records (typically only a page worth of them, that is, some 8kb). So while the records should be best inserted sequentially, your description of the problem is not accurate. Also, GUIDs are not bad per se, they are merely problematic if used as clustered key and generated randomly. To address this issue, SQL Server has the `newsequentialid()` function which can be used as column default, with performance pretty much on par with identity columns. – Lucero Feb 29 '16 at 23:58
  • Good to know about `newsequentialid()`. This is new to me. – Olivier Jacot-Descombes Mar 01 '16 at 12:46