3

Say I offer user to check off languages she speaks and store it in a db. Important side note, I will not search db for any of those values, as I will have some separate search engine for search. Now, the obvious way of storing these values is to create a table like

UserLanguages
(
 UserID nvarchar(50),
 LookupLanguageID int
)

but the site will be high load and we are trying to eliminate any overhead where possible, so in order to avoid joins with main member table when showing results on UI, I was thinking of storing languages for a user in the main table, having them comma separated, like "12,34,65"

Again, I don't search for them so I don't worry about having to do fulltext index on that column.

I don't really see any problems with this solution, but am I overlooking anything?

Thanks, Andrey

Vishwanath Dalvi
  • 35,388
  • 41
  • 123
  • 155
Andrey
  • 20,487
  • 26
  • 108
  • 176
  • 3
    even better, store them all combined within a single xml string and put that into one column! ;-) just joking, don't do that! – KM. Sep 28 '09 at 19:32
  • 2
    i talked to someone a while ago who did exactly that. he was pretty proud of what he had accomplished. _then, they asked him for reports_ – Raj More Sep 28 '09 at 19:34
  • Although I definitely think you should normalize like everyone else, if you don't I would advise you make the field that you store the comma separated values in large enough to accommodate ALL future possible ids that you wish to include. I work in QA and I've encountered a database that used this design before and it worked fine for a long time. But I'm guessing somewhere in the company's 10+ some years, no one realized that the requirements kept growing until the field couldn't store all values. Then you have someone like me go and attempt to pick every possible id and blow up the app. – Michael Cheng Sep 28 '09 at 20:22

9 Answers9

16

Don't.

  • You don't search for them now
  • Data is useless to anything but this one situation
  • No data integrity (eg no FK)
  • You still have to change to "English,German" etc for display
  • "Give me all users who speak x" = FAIL
  • The list is actually a presentation issue

It's your system, though, and I look forward to answering the inevitable "help" questions later...

gbn
  • 422,506
  • 82
  • 585
  • 676
  • Actually, "Give me all users who speak x" can be done: SELECT * FROM table WHERE ',' + languages + ',' LIKE ',' + x + ','. A terrible idea, but possible. – erikkallen Sep 28 '09 at 20:06
  • Of course: SELECT * FROM table WHERE ',' + languages + ',' LIKE '%,' + x + ',%' – erikkallen Sep 28 '09 at 20:07
  • @gbn This is my question over years. I used it in some Scenario and i have a table that store Id/Name so i concatenate id's with comma and store in one column. I wrote a function that get comma separated string and return table with one column and When i want to search something in this string use the IN operator from function result. is this a bad way? Another important issue might be this: If i would to use additional table to hold extra information i should be add more than 40 tables to current about 200 tables in my db. Is your opinion on adding tables yet? please describe if you could. – QMaster Oct 29 '14 at 20:48
12

You might not be missing anything now, but when you're requirements change you might regret that decision. You should store it normalized like your first instinct suggested. That's the correct approach.

What you're suggesting is a classic premature optimization. You don't know yet whether that join will be a bottleneck, and so you don't know whether you're actually buying any performance improvement. Wait until you can profile the thing, and then you'll know whether that piece needs to be optimized.

If it does, I would consider a materialized view, or some other approach that pre-computes the answer using the normalized data to a cache that is not considered the book of record.

More generally, there are a lot of possible optimizations that could be done, if necessary, without compromising your design in the way you suggest.

Jeremy Bourque
  • 3,533
  • 1
  • 21
  • 18
  • Well, the problem that there are about 30 of such multiple choices and all are to be displayed on the same profile page which ultimately means querying 30 tables for one request. It will definitely be a bottleneck, so I'm trying to see what are my options – Andrey Sep 28 '09 at 19:57
  • 1
    30 queries on a profile page? That should never be a problem with load. How often do you think someone will actually load their profile page? Once a minute? Hardly. Again, don't preoptimize – NotMe Sep 28 '09 at 20:06
11

This type of storage has almost ALWAYS come back to haunt me. For one, you are not even in first normal form. For another, some manager or the other will definitely come back and say.. "hey, now that we store this, can you write me a report on... "

I would suggest going with a normalized design. Put it in a separate table.

NotMe
  • 87,343
  • 27
  • 171
  • 245
Raj More
  • 47,048
  • 33
  • 131
  • 198
  • Well, as I stated, I don't care about reporting, as we'll maintain all normalized data in the search engine and reports will be querying it, not the db. The list in the db will be for two things only: UI and reindexing of search engine in case index drops. I see you point, as well as others who responded, but I'm still tempted because of performance – Andrey Sep 28 '09 at 20:05
5

Problems:

  1. You lose join capability (obviously).
  2. You have to reparse the list on each page load / post back. Which results in more code client side.
  3. You lose all pretenses of trying to keep database integrity. Just imagine if you decide to REMOVE a language later on... What's the sql going to be to fix all of your user profiles?
  4. Assuming your various profile options are stored in a lookup table in the DB, you still have to run "30 queries" per profile page. If they aren't then you have to code deploy for each little change. bad, very bad.
  5. Basing a design decision on something that "won't happen" is an absolute recipe for failure. Sure, the business people said they won't ever do that... Until they think of a reason they absolutely must do it. Today. Which will be promptly after you finish coding this.
  6. As I stated in a comment, 30 queries for a low use page is nothing. Don't sweat it, and definitely don't optimize unless you know for darn sure it's necessary. Guess how many queries SO does for it's profile page?
NotMe
  • 87,343
  • 27
  • 171
  • 245
4

I generally stay away at the solution you described, you asking for troubles when you store relational data in such fashion.

As alternative solution: You could store as one bitmasked integer, for example: 0 - No selection 1 - English 2 - Spanish 4 - German 8 - French 16 - Russian --and so on powers of 2

So if someone selected English and Russian the value would be 17, and you could easily query the values with Bitwise operators.

WebMatrix
  • 1,561
  • 3
  • 12
  • 17
  • 2
    I've seen Bitwise operations in a database, not a fan. – Gratzy Sep 28 '09 at 19:29
  • 1
    Isn't storing it as a bitmasked integer also trying to store relational data in a non relational fashion? Not trying to be argumentative, I'm wondering if I'm missing something. – Gratzy Sep 28 '09 at 19:40
  • 1
    @Gratzy, you could make such argument, it's not your traditional relational data storage technique, though it's not as bad as comma delimited. bitwize works best in certain situations where you need to test for multiple true conditions (and i think it applies to OP scenario), I could be wrong – WebMatrix Sep 28 '09 at 19:58
4

Premature optimization is the root of all evil.

EDIT: Apparently the context of my observation has been misconstrued by some - and hence the downvotes. So I will clarify.

Denormalizing your model to make things easier and/or 'more performant' - such as creating concatenated columns to represent business information (as in the OP case) - is what I refer to as a "premature optimization".

While there may be some extreme edge cases where there is no other way to get the necessary performance necessary for a particular problem domain - one should rarely assume this is the case. In general, such premature optimizations cause long-term grief because they are hard to undo - changing your data model once it is in production takes a lot more effort than when it initially deployed.

When designing a database, developers (and DBAs) should apply standard practices like normalization to ensure that their data model expresses the business information being collected and managed. I don't believe that proper use of data normalization is an "optimization" - it is a necessary practice. In my opinion, data modelers should always be on the lookout for models that could be restructured to (at least) third normal form (3NF).

LBushkin
  • 129,300
  • 32
  • 216
  • 265
2

If you're not querying against them, you don't lose anything by storing them in a form like your initial plan. If you are, then storing them in the comma-delimited format will come back to haunt you, and I doubt that any speed savings would be significant, especially when you factor in the work required to translate them back.

MartW
  • 12,348
  • 3
  • 44
  • 68
1

You seem to be extremely worried about adding in a few extra lookup table joins. In my experience, the time it takes to actually transmit the HTML response and have the browser render it far exceed a few extra table joins. Especially if you are using indexes for your primary and foreign keys (as you should be). It's like you are planning a multi-day cross-country trip and you are worried about 1 extra 10 minute bathroom stop.

The lack of long-term flexibility and data integrity are not worth it for such a small optimization (which may not be necessary or even noticeable).

Shane
  • 2,629
  • 6
  • 32
  • 39
0

Nooooooooooooooooo!!!!!!!!

As stated very well in the above few posts.

If you want a contrary view to this debate, look at wordpress. Tables are chocked full of delimited data, and it's a great, simple platform.

Jestep
  • 984
  • 2
  • 8
  • 20