1

I have an SQL table with the following structure:

Code1 - int
Code2 - int
Val1  - real
Val2  - real
Val3  - real

There is only one index (clustered) were Code1 is the first indexed column and Code2 is the second. The size of a single record is 20bytes.

I need to be able to store around 150,000,000 records and the biggest select operation would be on 500,000 records. I assume that the size of the table will be around 3GB

I would like to know if this design will work or there might be 'unexplained' problems or slowdowns when dealing with such a big table.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Gilad
  • 53
  • 3
  • 2
    I don't see an id field. Will you be using (code1, code2) as the primary key too? Is (code1, code2) unique? – Mark Byers Dec 29 '09 at 08:21
  • 4
    what kind of queries will you have on the table? That really determines what needs to be indexed. – marc_s Dec 29 '09 at 08:21

2 Answers2

1

Basically, a table with 150 million rows is nothing for SQL Server - won't even break a real sweat :-)

The point really is: how do you access the data? What kind of queries will you have? E.g. if you have queries that have a WHERE clause with just the "col2" column, then you don't have a good setup with a clustered index on (col1,col2).

Also: how is the data distributed in your fields? Which are selective, which are more uniform? If col1 or col2 are highly selective (e.g. a single value selects significantly less than 2% of the data), then use that field for your selects, if ever possible. Indexing something like a "gender" field which might have two, three different values won't really help, since any select using that field as a WHERE clause will always return way too much data to be effective.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • Each query would have a WHERE on both code1 and code2. code1 and code2 are the primary key. I'm looking at ~200,000 records added daily. There is only one type of query preformed on this table with a range on code1 and a range on code2. – Gilad Dec 29 '09 at 08:36
  • and how selective are code1 and code2 ?? E.g. given a value for code1, how many percent of the total data gets selected? – marc_s Dec 29 '09 at 08:39
  • 1
    Gilad - the reason marc_s is asking about selectivity on col1 and col2 is because it'll make a difference on which field you should place first. If col2 is more selective (meaning, there's more unique data in it) then it should be placed first in the clustered index. – Brent Ozar Dec 29 '09 at 11:18
0

There is a very complete question about SQL performance and large tables: Very large tables in SQL Server

Community
  • 1
  • 1
Rubens Farias
  • 57,174
  • 8
  • 131
  • 162