0

I have a standard user table. Now each user can add an unknown number of languages to his profile. What is the best way of storing the languages for each user? I want to query only once to fetch a user and all his corresponding languages. What is the best design?

Example:

User
====

id    name    
----------
99    peter


Languages
=========

id    userid    lang
-------------------- 
44    99        en
45    99        fr
Shlomo
  • 3,880
  • 8
  • 50
  • 82

3 Answers3

1

That's a many to many relationship (a user can have many languages, and viceversa) so to normalize the DB you should do a new table containing the two Foreign Keys (call it user_x_languages):

user_x_languages
================
user_id    lang_id
99         44
99         45

you would need to remove the userid column from you languages table

the query should then use a join

select name lang from user u
join user_x_languages x on u.id = x.user_id
join languages l on x.lang_id = l.id
where u.id = 99;

that would output a row for each language the user has, it will however be repeating the username on each row.

You want to look into DB normalization, a very nice non-technical article is this: A Simple Guide to Five Normal Forms in Relational Database Theory

following the advice of this thread about returning one line, the procedure would be something like (I might have changed the table and column names a little bit from your original):

delimiter //

Create function languages (p_user varchar(30))
returns text
begin
    DECLARE ret_val TEXT;
    select GROUP_CONCAT(lang) into ret_val
    from Users u
    join user_x_lang x on u.user_id = x.user_id
    join Lang l on x.lang_id = l.lang_id
    where u.name = p_user;
    return ret_val;
end//

delimiter ; 

and now the select should be (for all users):

select name, languages(name) from users;
Community
  • 1
  • 1
Maor
  • 118
  • 2
  • 10
  • Thanks, so what is best practice to query a user and all his languages then? Repeating the username etc does not seem to be best? – Shlomo Nov 17 '14 at 13:34
  • Since you're querying with a specific username you can omit the name in the select statement. In the case of multiple users you can use a function to query the DB for each user, which will return them as text – Maor Nov 17 '14 at 13:38
  • Yes but I will have other columns of the user like email, photo etc. I want all user data ONCE and the languages added – Shlomo Nov 17 '14 at 13:40
  • I've edited and added the function, follow the link to the question on GROUP_CONCAT(), to see the limit it has, you might have to change it if your table starts having many languages per user. – Maor Nov 17 '14 at 14:21
1

You may create a many-to-many relationship table. Like this:

Table "Users"

user_id | name
99        peter
100       tim

Table "Languages"

language_id | language
44            en
45            fr

Table "User_to_language"

User_id | Language_id
99        45
100       45
100       46

Or, if you need a lot of user attributes in future, you might use something like: Table User_attributes

user_id | attribute | value
46        'lang'      'en'

Also, this might be useful for you: Database normalization

  • can you write a query to get the user data from Users joining his languages? – Shlomo Nov 17 '14 at 13:29
  • select u.name, l.language from Users u join User_to_language utl on utl.User_id = u.User_id join Languages l on l.Language_id = utl.Language_id where u.name = 'tim'; –  Nov 17 '14 at 13:33
  • this will also return a table with each language - correct - however on each row the user data is also added. Is that best practice? – Shlomo Nov 17 '14 at 13:39
  • I'm not sure if it is the best practice, but I saw it multiple times in enterprise level projects. A table like I mentioned is going to be really small even with billion rows in it, so I can no doubt say that it will be good in terms of performance. At least for most OLTP systems. –  Nov 17 '14 at 13:48
1

Normalization is used to save the disk space i.e. a good normalized database takes lesser space than non-normalized but the price you pay is performance. Keep in mind more normalization = more join = more processing overhead. In my opinion, your design (using two tables) is good, you can easily retrieve data with a single join :

select * from Usrs u join Languages l on u.id = l.usrid

now some sample queries which will come in application code 
--select those users who speak 'en' then 

select * from Usrs u join Languages l on u.id = l.usrid
where l.lang = 'en'

--peter speaks which languages 
select * from Usrs u join Languages l on u.id = l.usrid
where u.name = 'peter'

try the same with a normalized data base you will find that you need all three tables to get desired data.

Deep
  • 3,162
  • 1
  • 12
  • 21