We want to store a user
table in the database. For each user, there are a number of attributes, such as age, sex, date of birth, etc.
We can of course model these commonly shared attributes as the columns in our user
table, however, it would become inconvenient later when we want to add new attributes, as we have to modify the table to add extra columns, and this user
table may be large. Especially, it would be worse if we have more than one type of users and each user type may have some unique attributes.
Alternatively, we can adopt a more flexible design, where we use a separate table with the following schema:
user_id | attribute_id | attribute_name | attribute_value
So that each user may occupy a few number of rows, each of which contains the data for a specific attribute. This way, we don't need to worry about adding new attributes to users (or user types). However, there is another problem: the data type of each attribute may be different from each other, e.g. some may be int, some may be float, and some others may be string. In response, we can have another table for mapping the data types to attributes, such as:
attribute_id | data_type
Or, we can simply store everything as strings, but somehow I think it would be a bad idea in terms of the amount of space required.
So, there exists a tradeoff here, between the flexibility of the schema and the complexity/fragmentation of the schema, and I wonder which one would be overall more advantageous than the other, or, if there is a third and better way of doing it.
Any comment/suggestion is welcome! Thanks.