Sql Design Question

Question

I have a table with 25 columns where 20 columns can have null values for some (30-40%) rows. Now what is the cost of having rows with 20 null columns? Is this OK?

Or

is it a good design to have another table to store those 20 columns and add a ref to the first table? This way I will only write to the second table only when there is are values.

I am using SQL server 2005. Will migrate to 2008 in future.

Only 20 columns are varchar, rest smallint, smalldate

What I am storing: These columns store different attributes of the row it belongs to. These attributes can be null sometimes.

The table will hold ~billion of rows

Please comment.

Which database server are you using? SQL Server 2008 supports the concept of sparse columns which supports your scenario with minimum impact. — Michael Brown, Jan 19 '11 at 00:20
Take a look at this answer: http://stackoverflow.com/questions/229179/null-in-mysql-performance-storage/230923#230923 And http://dev.mysql.com/doc/refman/5.0/en/working-with-null.html — RobertPitt, Jan 19 '11 at 00:21
Just a sidenote: If your table isn't all that large (lets say < 1 mio rows), it wont matter much in terms of storage, and the single-table solution will be slightly faster. — Martin, Jan 19 '11 at 01:15

score 2 · Answer 1 · answered Jan 19 '11 at 00:26

2

You should describe the type of data you are storing. It sounds like some of those columns should be moved to another table.

For example, if you have several columns that represent multiple columns for the same type of data, then I would say move it to another table On the other hand, if you need this many columns to describe different types of data, then you may need to keep it as it is.

So it kind of depends on what you are modelling.

answered Jan 19 '11 at 00:26

Jonathan Wood

65,341
71
269
466

+1 for describing the data... If we know the columns in question we can give better advice on restructuring (if necs.) – scunliffe Jan 19 '11 at 00:59
1

Added more info about data and type – kheya Jan 19 '11 at 04:01
@Projapati: You seem reluctant to offer much detail. Saying the columns "store different attributes of the row it belongs to" tells me nothing about the type of data you are storing. If you want to offer some examples, I can tell you what I think. But I'd need more to go on. – Jonathan Wood Jan 19 '11 at 04:20

score 2 · Answer 2 · answered Jan 19 '11 at 01:00

Are there some circumstances where some of those columns are required? If so, then perhaps you should use some form of inheritance. For instance, if this were information about patients in a hospital, and there was some data that only made sense for female patients, then you could create a FemalePatients table with those columns. Those columns that must always be collected for female patients could then be declared NOT NULL in that separate table.

Cade Roux · Answer 3 · 2011-01-19T01:15:10.077

It depends on the data types (40 nullable ints is going to basically take the same space as 40 non-nullable ints, regardless of the values). In SQL Server, the space is fairly efficient with ordinary techniques. In 2008, you do have the SPARSE feature.

If you do split the table vertically with an optional 1:1 relationship, there is a possibility of wrapping the two tables with a view and adding triggers on the view to make it updatable and hide the underlying implementation.

So there are plenty of options, many of which can be implemented after you see the data load and behavior.

score 1 · Answer 4 · answered Jan 19 '11 at 12:08

Create tables based on the distinct sets of attributes you have. So if you have some data where some of your columns do not apply then it would make sense to have that data in a table which doesn't have those columns. As far as possible, avoid repeating the same attribute in multiple tables. Make sure your data is in at least Boyce-Codd / 5th Normal Form and you won't go far wrong.

Sql Design Question

4 Answers4