6

I have a problem with designing database (SQL/MySQL). Let's assume we have a user, user can have many friends and many posts and filled some data about himself.

It's quite obvious that for friends we need one pivot_table for n:n relation, for posts we need to create one extra table with user_id (1:n) relation.

So we need users, user_friends and posts tables. This is obvious. This is how relations should be handled.

But now let's assume we want for users to have the following data:

name - text
description - text
marital status - select only one from list
favourite colour - select only one from list
hobby - select up to 3 from list

For text fields (name, description) it's really obvious we simply create varchar/text columns in users table and that's it.

The general question is: how the other fields (chosen from lists) should be handled? Should I create relations for them or maybe should I create standard data columns with them?

In my opinion there is no point to create relation tables for that because using lists (select) we only limit user when he can in fact paste into database. In theory we could allow user to manually input as favourite colour his colour (for example red and if he types something wrong for example reds we would compare it will list of allowed colours). The same would be for gender - there is no point in my opinion to create extra table when we hold only woman and man and create relation for it.

First DB design:

I could for example create the following columns for properties:

marital_status - int
fav_colour - int
hobby_1 - int
hobby_2 - int
hobby_3 - int

And have one other table (or even plain array in PHP or other language) where I store that value 1 for fav_colour is for example red, value 2 for hobby is music and so on (it doesn't matter how I store those values here - I could also use enum type for that) .

For me benefits for such attitude is not creating many relations that are in fact rather properties and not relations (as I mentioned above), so less work + easier getting information about user - you don't need to use any joins what would be important if you have for user for example 20 or 100 such properties and I can search in user table very easy. Disadvantages are also quite obvious - data is not normalized, for any multi selection (as for example hobby) I need to create 3 columns and if in future I decide that user can select not 1 colour but 2 or 3, I would need to add 2 extra columns.

Alternative DB design:

I create extra tables: colours, hobbies, marital_statuses and I create 3 pivots tables: user_colours, user_hobbies, user_marital_statuses. Disadvantages: many joins. Advantages - if I created 3 extra pivot tables I I could easily allow user to select up to 10 colours and I don't need redesign database at all. But disadvantages also occur - difficult searching, a lot of work, many joins.

Detailed question

So to sum up - what solution would be better assuming:

  1. I would probably not change the maximum count of one properties (if I decided I allow maximum 3 hobbies, this won't probably ever change)
  2. Lists of choices for many fields would be relative short (for most of them less than 10)
  3. I need to search a lot in such database. Someone for example wants to search user that have fav_colour set to red and have hobby music.

If there are any other solutions or advantages/disadvantages you see I appreciate to share with me.

Marcin Nabiałek
  • 109,655
  • 42
  • 258
  • 291
  • 1
    Here's another option. Create an attributes table with attributeName, attributeType, attributeValue, userId. This will allow you to add as many attributes to a user as you would like. Avoiding having to make breaking schema changes any time you think of a new bit of info that you want. – crthompson Oct 15 '14 at 18:12
  • 4
    @paqogomez: which is an anti-pattern called "entity-attribute-value" but probably the solution that hurts the less here. Another option would be to store the "dynamic attributes" as a JSON or XML document. But that makes handling them in SQL very hard. A third option might be to upgrade to Postgres and use Postgres' NoSQL features such as the key/value data type `hstore` or the built-in JSON support –  Oct 16 '14 at 07:43
  • @a_horse_with_no_name I actually had a comment that suggested a NoSQL option, but removed it. I figured it wasnt the direction the OP wanted to go. It exactly this type of data that NoSQL was made for however. – crthompson Oct 16 '14 at 14:38
  • @a_horse_with_no_name if EAV is an "anti-pattern", then storing JSON or XML in a database is also an antipattern, since it violates 1NF. – wildplasser Oct 22 '14 at 20:32

4 Answers4

1

It sounds like you want to enforce some constraints on some of your users properties. For example, favorite colour must be one of red, green, blue, pink, orange, etc; marital status must be one of single, divorced, married.

You've described one way to do this: lookup tables. This is the best way if the possible values are dynamic and require ongoing maintenance, or if there are many possible values. From your description, that is not your situation. Your possible values will be quite static and short.

I recommend employing a sql CHECK constraint. With it, you can control a field's possible values. For example:

CREATE TABLE users
(
Name varchar(255) NOT NULL,
Description varchar(255),
Marital_Status varchar(10) NOT NULL,
Color varchar(10) NOT NULL,
CONSTRAINT chk_Color CHECK (Color in ('Red', 'Blue', 'Green', 'Orange')),
CONSTRAINT chk_Marriage CHECK (Marital_Status in ('Single', 'Married', 'Divorced'))
)

I haven't syntax checked this DDL statement, so it may contain punctuation errors. Also, the syntax may vary for your particular DBMS. I think this should work for MySQL.

Eric
  • 4,201
  • 5
  • 27
  • 36
  • This could be fine but for a lot of colours it would not be very easy to put them into Constraint (and add another if needed). Also storing varchars would have one big issue. If the site is multilingual storing varchars won't be very good idea here I think. – Marcin Nabiałek Oct 21 '14 at 12:27
1

If users can change favorite colors/hobbies frequently, I would use lookup tables, in my example I'll refer to them as decode tables. All of the relationships between user/hobbies and user/colors will be found in that decode table.

Since you can only have 1 marital status, that is easy to handle it's a 1 to many relationship.

Create a table Marital_Status with 2 fields, Id (pk) and Status(varchar(n)) The decode table won't be required to lookup marital status.

Now I would recommend creating a table to hold colors and a table for hobbies. Same way we did marital status.

Hobbies

HobbyId, Hobby

Colors
ColorId, Color

Whenever you need to add/delete a new hobby/color do it in these decode tables.

It's up to you whether you want to use 1 decode table for each relationship or many ie. Hobby_Decode and Color_Decode etc.

I'll explain the scenario of using 1.

Create your decode table with the following fields...

Decode

Item_Type varchar(n) --We will push either Hobby or Color in this field

UserId int --self explanatory, holds the Id of the User to "lookup"

LookupId --will hold id's of either the Hobby or the Color

Let me create some sample data, and we will work off of that.

Hobbies table data

 | HobbyId | Hobby

      1      Studying 
      2      Doing Drugs
      3      Drinking     

Colors table data

 | ColorId | Color

     1        Red 
     2        Blue

While we're at it, Here's our user table.

Users

 | UserId | Name

      1     Marcin 
      2     CSharper

I like drinking, doing drugs and the color red, You're a nerd so you like to study and the color Blue. In our decode table we'll add the following entries to represent that.

Decode

 | Item_Type| UserId | LookUpId

    'Hobby'      2        2
    'Hobby'      2        3
    'Color'      2        1
    'Hobby'      1        1
    'Color'      1        2      

Looking at that decode table doesn't really tell us anything. Once we join our decode table to colors/hobbies it will be evident.

If you want to look up all of my hobbies and my favorite colors the query will look this

note: this is SQL Server syntax not mysql.

--Pull Hobbies
Select u.Name, dH.Item_Type as 'Favorite', h.Hobby as 'Item'
from User u
inner join decode dH on dH.UserId = u.UserId 
                     and dH.Item_Type = 'Hobby'
inner join Hobby h on h.HobbyId = dH.LookUpId
where u.UserId = 2 

--Union in Colors
Union

Select u.Name, dH.Item_Type as 'Favorite', h.Hobby 'Item'
from User u
inner join decode dC on dH.UserId = u.UserId 
                     and dH.Item_Type = 'Color'
inner join Color c on c.ColorId = dH.LookUpId
where u.UserId = 2 

Your output will look like

|    Name    |    Favorite   |     Item 

   CSharper         Hobby         Drinking
   CSharper         Hobby         Doing Drugs
   CSharper         Color         Red

If it is setup like this, than it is extremely easy to change/update peoples favorite hobbies and colors. The decode table will handle all of that. It just takes a simple entry or deletion of that table. And also this way, a User can have an infinite amount of favorite hobbies and colors since it's the decode table that drives that, and not the Users table definition.

Manipulating your sample query a little bit, if we want to find all Users who like the color blue and drinking the query would look like.

Select u.Name
from User u 
inner join decode d on d.UserId = u.UserId
inner join Hobby h on h.HobbyId = d.LookUpId and d.Item_Type = 'Hobby'
inner join Color c on C.ColorId = d.LookUpId and d.Item_Type = 'Color'
where h.Hobby = 'drinking' and c.Color = 'blue'

Doing joins like this is perfectly acceptable.

CSharper
  • 5,420
  • 6
  • 28
  • 54
  • Hmm, wouldn't it be to complex? What if I want to get all properties for one user. Here we have colour and up to 3 hobbies. But what in case there are more than 20 such `relations`? Wouldn't it be to complex and impact speed too much? – Marcin Nabiałek Oct 24 '14 at 17:13
  • No not at all, I work at a Financial company and trust me, the Queries are Heavy to say the least and this works just fine. You can throw an Index on UserId and Item_type to speed things up even more. You can split it up and use multiple decode tables if you wish, it would probably be easier and more readable in your case. If your saying that a User can have varying amounts of Favorite X I don't know of a simpler solution. Changing the user table constantly to incorporate multiple favorite X would not be optimal. A decode table to hold each relation would work. – CSharper Oct 24 '14 at 17:31
1

You want to avoid extra tables and joins unless really needed. This is exactly what enums are for. enums internally store as integer and in usage look like strings with constrained values.

create table users (
  user_id bigint unsigned not null auto_increment primary key,
  name varchar(255) not null,
  description varchar(255),
  marital_status enum('single', 'married'),
  favorite_color enum('red', 'green', 'blue'),
  hobby1 enum('painter', 'doctor', 'lawyer'),
  hobby2 enum('painter', 'doctor', 'lawyer'),
  hobby3 enum('painter', 'doctor', 'lawyer')
);

To insert a value: insert into table users (name, marital_status) values ('Jack', 'single');

This statement will fail: insert into table users (name, marital_status) values ('Jack', 'abcd');

Modifying the list is a simple and fast operation: alter table users modify marital_status enum('divorced', 'single', 'married');

  • Ok, but what about data validation? How can I check which values are allowed? For example user will fill in color abc. Can I query enum field to get allowed values? And the extra question - how to store translations for them? Assuming we use more than 1 language on page, the names should be then duplicated in other table/data to store there translation – Marcin Nabiałek Oct 24 '14 at 17:08
  • You can use "show create table 'tablename'" or query the [information_schema](http://stackoverflow.com/questions/2350052/how-can-i-get-enum-possible-values-in-a-mysql-database) to get the allowed values in an enum. For translations, you want to have language-independent values that you use in the enum and translations that will be a separate table. – anupam.singhal Oct 25 '14 at 01:24
0

Whichever you pick is good, don't rely to much to Normalization.

But for me, would go with 5 tables users, marital_status, colours, hobbies, user_hobbies

CREATE TABLE users (
  user_id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  description VARCHAR(255),
  marital_status INT,
  fav_colour INT
)

CREATE TABLE marital_status (
  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(255) NOT NULL
)

CREATE TABLE colours (
  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  code VARCHAR(7)
)

CREATE TABLE hobbies (
  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(255) NOT NULL
)

CREATE TABLE user_hobbies (
  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  user_id BIGINT,
  hobby_id INT
)

For the pivot tables, i would suggest to create/populate them separately from application, such as using command line or message queue (either using crontab functionality)

Zein Miftah
  • 97
  • 3
  • 12