How to design the database when you need too many columns?

Question

I have a table called cars but each car has hundreds of attributes and they keep on increasing over time (horsepower, torque, a/c, electric windows, etc...) My table has each attribute as a column. Is that the right way to do it when I have thousands of rows and hundreds of columns? Also, I made each attribute a column so I facilitate advanced searching / filtering.

Using MySQL database.

Thanks

And what about using NoSQL DB, like MongoDB. It will allow your schema to change (adding new attributes) easily. — danieln, May 17 '13 at 19:58

score 4 · Answer 1 · answered May 17 '13 at 18:01

4

I guess the obvious question is, then: why not have a table car_attrs(car, attr, value)? Each attribute is a row. Most queries can be re-written to use this form.

answered May 17 '13 at 18:01

Nitzan Shaked

13,460
5
45
54

I second that. I use this scheme in multiple dbs where not even the products' type can be given, like in your example. Is a good and most times fast solution, and scalable as well. – kms May 17 '13 at 18:04

OnoSendai · Accepted Answer · 2013-05-17T19:02:24.110

This is an interesting question IMHO, and the answer may depend on your specific data model and implementation. The most important factor in this case is data density.

How much of each row is actually filled up, in average?

If most of your fields are always present, then data scope partition may be the way to go.
If most of your fields are empty, then a metadata-like structure (like @JayC suggested) may be more attractive.

Let's use the case you mentioned, and do some simulations.

On the first case, scope partition, the idea is to implement partitions based on scope or usage. As an example of partitioning by usage, let's say that the most retrieved fields are Model, Year, Maker and Color. These fields may compose your main [CAR] table, the owner of the ID field which will exclusively identify the vehicle. Now let's say that Engine, Horsepower, Torque and Cylinders are also used for searches from time to time, but not so frequently. These may exist on a secondary table [CAR_INFO_1], which is tied to the first table by the presence of the CAR_ID field, a foreign key. Proceed by creating as many partitions you need.

Advantage: Simpler queries. You may coalesce all information about a vehicle if you do a joint query (for example inside a VIEW).

Downside: Maintenance. Each new field must be implemented in the model itself, and you need an updated data model to locate where the field you need is actually stored (or abstract it inside a view.)

Metadata format is much more elegant, but demands more of your database engine. Check @JayC's and @Nitzan Shaked's answers for details.

Advantages: 100% data density. You'll never have empty Data values. Also maintenance - a new attribute is created by adding it as a row to the metadata identifier table. Data structure is less complex as well.

Downside: Complex queries, together with more complex execution plans. Let's say you need all Ford cars made in 2010 that are blue. It would be very trivial on the first case:

SELECT * FROM CAR WHERE Model='Ford' AND Year='2010' AND Color='Blue'

Now the same query on a metadata-structured model:

Assume the existence of this two tables,

CAR_METADATA_TYPE
ID  DESC
1   'Model'
2   'Year'
3   'Color'

and

CAR_METADATA [CAR_ID], [METADATA_TYPE_ID], [VALUE]

The query itself would like something like this:

SELECT * FROM CAR, CAR_METADATA [MP1], CAR_METADATA [MP2], CAR_METADATA [MP3]
WHERE MP1.CAR_ID = CAR.ID AND MP1.METADATA_TYPE_ID = 1 AND MP1.Value='Ford'
AND MP2.CAR_ID = CAR.ID AND MP2.METADATA_TYPE_ID = 2 AND MP2.Value='2010'
AND MP3.CAR_ID = CAR.ID AND MP3.METADATA_TYPE_ID = 3 AND MP3.Value='Blue'

So, it all depends on you needs. But given your case, my suggestion would be the Metadata format.

(But do a model cleanup first - no repeated fields, 1:N data on their own table instead of inline fields like Color1, Color2, Color3, this kind of stuff ;) )

score 2 · Answer 3 · answered May 17 '13 at 18:04

If it is all about features, create a features table, list all your features as rows and give them some sort of automatic id, and create a car_features that with foreign keys to both your cars table and your features table that associates cars with features, maybe along with any values associated with the relationship (one passenger electric seat, etc.).

score -2 · Answer 4 · answered May 17 '13 at 18:02

If you have ever changing attributes, then consider storing them in an XML blob or text structure in one column. This structure is not relational. The most important attributes will then be duplicated in additional columns so you can craft queries to search on them as the Blob will not be searchable from SQL queries. This will cut down on the amount of columns in that table and allow for expansion without changing the database schema.

As others as suggested, if you want all the attributes in a table, then use an attribute table to define them. Then will depend on your requirements and needs of the application.

How to design the database when you need too many columns?

4 Answers4

Linked

Related