3

I have two tables, one user table and an items table. In the user table, there is the field "items". The "items" table only consists of a unique id and an item_name.

Now each user can have multiple items. I wanted to avoid creating a third table that would connect the items with the user but rather have a field in the user_table that stores the item ids connected to the user in a "csv" field.

So any given user would have a field "items" that could have a value like "32,3,98,56".

It maybe is worth mentioning that the maximum number of items per user is rather limited (<5).

The question: Is this approach generally a bad idea compared to having a third table that contains user->item pairs?

Wouldn't a third table create quite an overhead when you want to find all items of a user (I would have to iterate through all elements returned by MySQL individually).

Armin Hierstetter
  • 1,078
  • 2
  • 12
  • 27
  • 2
    Yes, it is very bad. Both usability and performance takes a bad hit. Much better to add an extra table for that info. – Linkan Feb 07 '17 at 14:27
  • But performance would be much better as I do not need multiple selects, huh? I would need to iterate through each result individually to get all items rather than simply fetching one row of user data. – Armin Hierstetter Feb 07 '17 at 14:28
  • 1
    Very very bad idea. **NEVER** do it – Gurwinder Singh Feb 07 '17 at 14:28
  • Still, I do not see why in this case. I really would be grateful if somebody could tell me an explanation as to why this is such a bad idea? – Armin Hierstetter Feb 07 '17 at 14:29
  • Consider the case when you want to join this csv fied with some other table or when you want to find distinct items for a given condition? what will you do? – Gurwinder Singh Feb 07 '17 at 14:31
  • I see what you mean, but I really, honestly, positively only need to have the easiest way to get ALL items of a given user as simple as I possibly can. (and fastest) – Armin Hierstetter Feb 07 '17 at 14:33
  • Also: I would not be sure how to do a SELECT that gives me back ONE row per user that contains ALL items? How would I do that? Wouldn't a join give me a several rows for each user depending on how many items he has? – Armin Hierstetter Feb 07 '17 at 14:35
  • Performance would be very bad. You'd need a lot **MORE** operations to split this thing, extract individual values etc. You just can't query inside such a thing, or use any kind of indexing. You'd have to scan all rows – Panagiotis Kanavos Feb 07 '17 at 14:57
  • Besides, if the number of items is small, why not add *more* fields? Not a very good idea, but definitelly better than an opaque string field that sometimes gets parsed as a list of integers – Panagiotis Kanavos Feb 07 '17 at 14:58
  • How would you find the user that corresponds to a specific item? This would be impossible with the CSV field and difficult with multiple fields. Perhaps you think that you don't need this right now, but such requests have a tendency to come up by next Monday. The multiple rows that you fear may cost absolutely nothing due to server-side indexing and caching. A full scan though will cost a *lot* – Panagiotis Kanavos Feb 07 '17 at 15:01
  • @Panagiotis: That would be super simple: SELECT * from users WHERE item LIKE "%foo%" – Armin Hierstetter Feb 07 '17 at 15:08
  • @ArminHierstetter congratulations, you just scanned an entire 1M row table. You can't use any index with such a wildcard search, which means the server would have to read everything. Normal forms exist for a reason which has nothing to do with the modern myth about saving space. Without them querying becomes a lot *harder* – Panagiotis Kanavos Feb 07 '17 at 15:11
  • @ArminHierstetter And '%3%' will give you 3,13 and 32. This is no longer a matter of getting performance hits but getting wrong data. – Linkan Feb 07 '17 at 16:16
  • Ok that was a shot in the foot. – Armin Hierstetter Feb 07 '17 at 16:54
  • Possible duplicate of [Is storing a delimited list in a database column really that bad?](http://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad) – Don't Panic Apr 26 '17 at 19:00

2 Answers2

2

You don't want to store the value in the comma separated form.

Consider the case when you decide to join this column with some other table.

Consider you have,

x  items
1  1, 2, 3
1  1, 4
2  1

and you want to find distinct values for each x i.e.:

x  items
1  1, 2, 3, 4
2  1

or may be want to check if it has 3 in it

or may be want to convert them into separate rows:

x  items
1  1
1  2
1  3
1  1
1  4
2  1

It will be a HUGE PAIN.

Use atleast normalization 1st principle - have separate row for each value.

Now, say originally you had this as you table:

x  item
1  1
1  2
1  3
1  1
1  4
2  1

You can easily convert it into csv values:

select x, group_concat(item order by item) items
from t
group by x

If you want to search if x = 1 has item 3. Easy.

select * from t where x = 1 and item = 3

which in earlier case would use horrible find_in_set:

select * from t where x = 1 and find_in_set(3, items);

If you think you can use like with CSV values to search, then first like %x% can't use indexes. Second, it will produce wrong results.
Say you want check if item ab is present and you do %ab% it will return rows with abc abcd abcde .... .

If you have many users and items, then I'd suggest create separate table users with an PK userid, another items with PK itemid and lastly a mapping table user_item having userid, itemid columns.

If you know you'll just need to store and retrieve these values and not do any operation on it such as join, search, distinct, conversion to separate rows etc. etc. - may be just may be, you can (I still wouldn't).

Gurwinder Singh
  • 38,557
  • 6
  • 51
  • 76
0

Storing complex data directly in a relational database is a nonstandard use of a relational database. Normally they are designed for normalized data.

There are extensions which vary according to the brand of software which may help. Or you can normalize your CSV file into properly designed table(s). It depends on lots of things. Talk to your enterprise data architect in this case.

Whether it's a bad idea depends on your business needs. I can't assess your business needs from way out here on the internet. Talk to your product manager in this case.

Geoffrey Anderson
  • 1,534
  • 17
  • 25