0

Let's say I want to create a table like this:

id | some_foreign_id | attribute | value
_________________________________________
1           1            Weight     100
2           1            Reps       5
3           2            Reps       40
4           3            Time       10
5           4            Weight     50
6           4            Reps       60

Versus the same data represented this way

id | some_foreign_id | weight | reps | time
____________________________________________
1           1           100      5     NULL
2           2           NULL     40    NULL
3           3           NULL     NULL  10
4           4           50       60    NULL

And since in this case the id = foreign_id I think we can just append these columns to whatever table foreign_id is referring to.

I would assume most people would overwhelmingly say the latter approach is the accepted practice.

Is the former approach considered a bad idea, even though it doesn't result in any NULLs? What are the tradeoffs between these two approaches exactly? It seems like the former might be more versatile, at the expense of not really having a clear defined structure, but I don't know if this would actually result in other ramifications. I can imagine a situation where you have tons of columns in the latter example, most of which are NULL, and maybe only like three distinct values filled in.

  • The first one is using whats known as entity value attribute model. http://en.wikipedia.org/wiki/Entity-attribute-value_model Some people hate it, some know how to use it. Research your options here. – Twelfth Aug 26 '16 at 15:27
  • @Twelfth Would you say both approaches are justified depending on context? Would you use EAV model if there were many columns? What about queries and joins? – user6762057 Aug 26 '16 at 15:31
  • @Twelfth I know it as the entity attribute value model, hence the EAV acronym ;-) – Strawberry Aug 26 '16 at 15:43
  • If you are going to use an EAV, consider separating out the different attributes into separate tables according to their data types. So, you might have a table of DECIMAL type attributes, and another table of DATE type attributes. – Strawberry Aug 26 '16 at 15:44
  • @Strawberry I don't quite understand what you mean. Are you saying making a new table for every single column, basically? – user6762057 Aug 26 '16 at 15:47
  • @Strawberry my god I'm dyslexic. – Twelfth Aug 26 '16 at 15:51
  • A new table for every single column type - so that you can take advantage of data types. – Strawberry Aug 26 '16 at 15:51
  • @Strawberry - I included your comment in my answer...normally I create a header table that holds any columns that I want to use in where clauses frequently and store them in the correct type so they can be indexed. Never though of a different table for each data type before. – Twelfth Aug 26 '16 at 15:55
  • @Twelfth It's certainly not compulsory - and it can be a pain to manage, but I think it's a shame to lose some of the benefits that data types can provide. – Strawberry Aug 26 '16 at 15:58

5 Answers5

1

EAV is the model your first example is in. It's got a few advantages, however you are in mysql and mysql doesn't handle this the best. As pointed out in this thread Crosstab View in mySQL? mysql lacks functions that other databases have. Postgres and other databases have some more fun functions PostgreSQL Crosstab Query that make this significantly easier. In the MSSQL world, this gets referred to as sparsely populated columns. I find columnar structures actually lend themselves quite well to this (vertica, or high end oracle)

Advantages:

  • Adding a new column to this is significantly easier than altering a table schema. If you are unsure of what future column names will be, this is the way to go

  • Sparsely populated columns result in tables full of nulls and redundant data. You can setup logic to create a 'default' value for a column...IE if no value is specified for this attribute, then use this value.

Downsides:

  • A bit harder to program with in MySQL in particular as per comments above. Not all SQL dev's are familiar with the model and you might accidentally implement a steeper learning curve for new resources.

  • Not the most scalable. Indexing is a challenge and you need work around (Strawberry's input in the comments is towards this, your value column is basically forced to Varchar and that does not index well, nor does it search easily...welcome to table scan hell) . Though you can get around this with a third table (say you query on dates like create date and close date alot. Create a third 'control' table that contains those frequently queried columns and index that...refer to the EAV tables from there) or creating multiple EAV tables, one for each data type.

Community
  • 1
  • 1
Twelfth
  • 7,070
  • 3
  • 26
  • 34
0

First one is the right one.

  • If later you want change the number of properties, you dont have to change your DB structure.

    • Changing db structure can cause your app to break.
  • If the number of null is too big you are wasting lot of storage.

Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
  • But does it become harder at all to perform queries? Doesn't it become harder to do things like joins since they are no longer encoded as columns? – user6762057 Aug 26 '16 at 15:30
  • Well maybe depend on how are you going to use it, you can get a slightes benefit. But in terms of storage and scalability firs option is much more efficient. Also harder is a relative term. Once you know how to do it isnt hard at all. And db is design to handle those kind of tables. – Juan Carlos Oropeza Aug 26 '16 at 15:34
0

My take on this The first I would probably use if I have a lot of different attributes and values I would like to add in a more dynamic way, like user tags or user specific information etc,

The second one I would probably use if I just have the three attributes (as in your example) weights, reps, time and have no need for anything dynamic or need to add any more attributes (if this was the case, I would just add another column)

I would say both works, it is as you yourself say, "the former might be more versatile". Both ways needs their own structure around them to extract, process and store data :)

Edit: for the first one to achieve the structure of the second one, you would have to add a join for each attribute you would want to include in the data extract.

Moptan
  • 326
  • 1
  • 11
0

I think the first way contributes better towards normalization. You could even create a new table with attributes:

id  attribute
______________
1   reps
2   weight
3   time

And turn the second last column into a foreign id. This will save space and will save you the risk of mistyping the attribute names. Like this:

id | some_foreign_id | attribute | value
_________________________________________
1           1            2         100
2           1            1         5
3           2            1         40
4           3            3         10
5           4            2         50
6           4            1         60
Iain
  • 387
  • 2
  • 12
0

As others have stated, the first way is the better way. Why? Well, it normalizes the structure. Reference: https://en.wikipedia.org/wiki/Database_normalization

As that article states, normalization reduces database size & allows for easy expansion.

rlillbac
  • 71
  • 8