3

I'm developing an app and it requires me to design the database. I'm wondering what'd be more optimal design in following scenario:

Approach 1:

Have one user table with all the user fields viz.

id | uid | username | first_name | last_name | profession

OR

Approach 2:

Table I:

id | uid | username 

Table II:

uid | key | value |
1   | 'first_name' | John
2   | 'last_name'  | Donald and so on

The first approach favours more columns to store the user data, while the second approach relies on multiple tables and stores data into several rows for each user.

The second approach would mean that for each user, the user_meta table will have large number of rows while approach #1 will be more compact.

Questions:

  1. Which approach is better in terms of performance and speed of queries?
  2. Is there any rule for designing the database where you've to decide whether to store the data in rows vs columns?
TheBigK
  • 233
  • 4
  • 16
  • 2
    Stick with the first one, as long as you can predict the required attributes. (Which is true for users). If the table would be "products" and you can't predict the attributes a certain product has - then choose 2. (Which is called EAV-Pattern btw.) – dognose Apr 21 '17 at 15:09
  • 1
    DO not under any circumstances use the second option. EAV is a performance killer and only should be used when you cannot define teh columns needed in advance – HLGEM Apr 21 '17 at 15:09
  • To answer your questions: (1) is faster, (2) is more flexible (see my product example), when not every entry HAS such an attribute. (The First approach would require a lot of NULL-columns (i.e. `NumberOfDoors` on a `cat`), the second one would just not have a row for that information) – dognose Apr 21 '17 at 15:16
  • A key value pair store in your database will almost always provide poor performance. Use database normalization and denormalization to determine how your data should be structured. – Ryan Rentfro Apr 21 '17 at 15:48
  • Okay, but even with the first approach, I can always add more columns to accommodate for extra information. AFAIK, WordPress uses the second approach to store data for users in its post_meta and user_meta tables. – TheBigK Apr 22 '17 at 04:57

2 Answers2

4

The first model you propose is a regular relational design. It is widely used, very efficient in terms of speed and storage space, but it requires you to understand the data model before you store the data; adding an additional field would require a schema change.

The second model you propose is commonly known as "Entity-Attribute-Value" or EAV. You'll find a detailed question here.

It's worth thinking this through though - imagine a screen which lists all users who have logged in today. In your first model, you issue a single query - select * from users where last_logged_in >= '1 Jan 2015'

Now imagine that query in model 2 - you'd have something like

select u.*, ln.value, fn.value
from users u
outer join metadata ln on u.user_id = ln.user_id
and ln.key = 'last_name'
outer join metadata fn on u.user_id = fn.user_id
and fn.key = 'first_name'
and u.llast_logged_in >= '1 Jan 2015'

Two outer joins, and a complex query once you go beyond this trivial example.

If you have a lot of additional data, and you don't expect to use it as a major part of the relational model (i.e. use it as a criteria in a join or where statement), you can use MySQL's support for JSON or XML.

This allows you to store data whose schema you may not know at design time, and which is "sparse" (i.e. not all records have all fields populated), but it's slightly more awkward to query and populate into your client language.

Community
  • 1
  • 1
Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • Right! I think the query example clear shows the first approach is better. Let's say I've to store large amount of data per user; but not query it always; would approach #1 be still preferable? – TheBigK Apr 22 '17 at 05:02
  • I've edited the answer. It's much easier if you ask specific questions rather than hypothetical scenarios - in general, it's easier to use "just columns", even if they aren't all populated; if there are lots of attributes, or you don't know their schema, you can use document storage (XML or JSON). – Neville Kuyt Apr 23 '17 at 13:23
3

You can actually use a combination of the two. For the common data which you can define, stick to a table with fixed column names. Then when you add attributes which are (for example) customer defined, then use the second method to supplement the data.

Nigel Ren
  • 56,122
  • 11
  • 43
  • 55
  • Yes, the second approach does allow for flexibility. But can I not just add extra columns in the first database to accommodate for more fields? – TheBigK Apr 22 '17 at 04:58