15

I have a web application that I am currently working on that uses a MySQL database for the back-end, and I need to know what is better for my situation before I continue any further.

Simply put, in this application users will be able to construct their own forms with any number fields (they decide) and right now I have it all stored in a couple tables linked by foreign keys. A friend of mine suggests that to keep things "easy/fast" that I should convert each user's form to a flat table so that querying data from them stays fast (in case of large growth).

Should I keep the database normalized with everything pooled into relational tables with foreign keys (indexes, etc) or should I construct flat tables for every new form that a user creates?

Obviously some positives of creating flat tables is data separation (security) and query speeds would be cut down. But seriously how much gain would I get from this? I really don't want 10000 tables and to be dropping, altering, and adding all of the time, but if it will be better than I will do it... I just need some input.

Thank you

StratusBase LLC
  • 289
  • 1
  • 5
  • 16
  • 6
    Normalize until it hurts. :) – shamazing Dec 01 '10 at 19:07
  • Not a real answer... but you can always use Wikipedia as guidance. Here is Wikipedia's database schema: http://commons.wikimedia.org/wiki/File:Mediawiki-database-schema.png – Dragontamer5788 Dec 01 '10 at 19:10
  • 4
    @shamazing then denormalize till it works. 80)) – Keng Dec 01 '10 at 19:12
  • I have recently asked a similar question, maybe from a different perspective http://programmers.stackexchange.com/questions/212822/why-many-designs-ignore-normalization-in-rdbms – Yosi Dahari Oct 10 '13 at 14:45

7 Answers7

22

Rule of thumb. It's easier to go from normalized to denormalized than the other way around.

Start with a reasonable level of database normalization (by reasonable I mean readable, maintainable, and efficient but not prematurely optimized), then if you hit performance issues as you grow, you have the option of looking into ways in which denormalization may increase performance.

Bob Palmer
  • 4,714
  • 2
  • 27
  • 31
6

Keep your data normalized. If you index properly, you will not encounter performance issues for a very long time.

Regarding security: The flat approach will require you to write lots of create/drop table, alter table etc statements, ie a lot more code and a lot more points of failure.

The only reason to have flat files would be when your users can connect to the DB directly (you could still go for row level security). But in that case, you are really reimplementing a variant of phpmyadmin

Martin
  • 1,622
  • 4
  • 16
  • 27
  • +1 very good answer. Martin what to do when performance issues begin to appear after that very long time? I have never worked with that much data/traffic so not sure what is the next step – Sandeepan Nath Dec 01 '10 at 19:33
  • a) Indices are *very* fast - looking up values in a 100 mio. row table in generally no problem as long as the relevant columns are indexed. So there is really a lot of headroom before you hit performance issues. b) You can *partition* tables by range, eg create a partition per 1000 user ids. Your queries will mostly affect a single userid and thus a single partition - so this should scale up almost linearly. – Martin Dec 01 '10 at 21:19
3

...in this application users will be able to construct their own forms with any number fields...

Yikes! Then how could you possibly do any sort of normalization when the users are, in essense, making the database decisions for you.

I think you either need to manage it step by step or let your freak flag fly and just keeping buying hardware to keep up with the thrashing you're going to get when the users really start to get into it....Case in point, look what happens when users start to understand how to make new forms and views in SharePoint...CRIKY!! Talk about scope creep!!

Keng
  • 52,011
  • 32
  • 81
  • 111
  • 1
    Clearly define what fields/inputs that they can create. Limit the amount of customizations they can do. The scope is defined for the project and shouldn't change unless I make it do so. Thanks for your input. – StratusBase LLC Dec 01 '10 at 19:17
  • 1
    @Steve B. You might consider a palet of universal fields they can add that are normalized. For instance: employee id which goes to the emp_table so that people aren't recreating the wheel. – Keng Dec 01 '10 at 19:20
  • I have a set list of 15 or so inputs that a user could possibly use on a form, this may grow but it is enough to do almost anything they would need, they are stored in a static table and linked by id to user forms. – StratusBase LLC Dec 01 '10 at 19:24
2

Altering the schema during runtime is rarely a good idea. What you want to consider is the EAV (Entity-Attribute-Value) model.

Wikipedia has some very good info on the pros and cons, as well as implementation details. EAV is to be avoided when possible, but for situations like yours with an unknown number of columns for each form, EAV is woth considering.

D'Arcy Rittich
  • 167,292
  • 40
  • 290
  • 283
  • I'd never heard of EAV, but it appears to be similar to the solution I proposed above using a table with key/value pairs. Is my proposed solution above similar to the EAV solution you suggested? I'm just curious because I'd like to learn more about EAV modeling. – Madison Caldwell Dec 01 '10 at 19:26
  • 1
    @Matt: yes, that is exactly right. In your case, E=form_id, A=key, V=value. There are modified versions where you have additional value columns for different data types, so you can gain efficiency with indexes and aggregation, etc., but this also adds complexity to queries. – D'Arcy Rittich Dec 01 '10 at 19:43
1

Keep your data normalized. The system will should stay fast provided you have proper indexing.

If you really want to go fast then switch the schema to one of the key value databases like bigDB /couchDB etc. That is totally denormalized and very very fast.

Byron Whitlock
  • 52,691
  • 28
  • 123
  • 168
1

The way I would handle this is to use a normalized, extensible "Property" table, such as below:

Table: FormProperty
 id: pk
 form_id: fk(Form)
 key: varchar(128)
 value: varchar(2048)

The above is just an example, but I've used this pattern in many cases, and it tends to work out pretty well. The only real "gotcha" is that you need to serialize the value as a string/varchar and then deserialize it to whatever it needs to be, so there is a little added responsibility on the client.

Madison Caldwell
  • 862
  • 1
  • 7
  • 16
  • To create a login form, for example, you could: insert into FormProperty(form_id, key, value) values (1, 'email', ''); insert into FormProperty(form_id, key, value) values (1, 'password', 'password'); – Madison Caldwell Dec 01 '10 at 19:22
  • As an alternative to the json/xml in the above example, you could create an additional table(s) for the field properties and link it/them via foreign keys. – Madison Caldwell Dec 01 '10 at 19:23
0

Normalized == fast searches, easier to maintain indexes, slower insert transactions (on multiple rows)

Denormalized == fast inserts, ususally this is used when there are a lot of inserts (data warehouses that collect and record chronological data)

dexter
  • 7,063
  • 9
  • 54
  • 71