-1

We are developing a system using MySQL database involve Holiday.

Some holiday may only apply to certain states, Example: Good Friday only apply to Hawaii, Kentucky, Louisiana and etc.

holiday     | state
---------------------
Good Friday | Connecticut, Delaware, Hawaii, Indiana, Kentucky, Louisiana, New Jersey, North Carolina, North Dakota, Tennessee

Our team have some discussion on how should state store in MySql database after normalisation.

At first we come out the design like following (some columns like date or holiday id are skipped):

Design A

holiday     | state code
------------------------
Good Friday | CT
Good Friday | DE
Good Friday | HI
Good Friday | IN
Good Friday | KY
Good Friday | LA
Good Friday | NJ
Good Friday | NC
Good Friday | ND
Good Friday | TN

But one of our team member propose another design:

Design B

holiday     | state
---------------------
Good Friday | CT,DE,HI,IN,KY,LA,NJ,NC,ND,TN

Design B looks like contradict with what I learn about database design in terms of normalisation, but my team member insist this is the correct method, and never see people design table like Design A

May I know which one more proper? Or is there some new method in database design that I didn't know that cause Design B is used in current database design?

Thanks

[Edit]

To make things more specific, the purpose of the holiday table is to calculate due date:

date_begin + N working days = due_date

If N = 3 and date begin is 2020-04-07, (assume Saturday is working day)

2020-04-07 + 3 = 2020-04-10 // Good Friday

If the case register in Connecticut, where 2020-04-10 is holiday, thus the due date will be 2020-04-11, but if the case register in other state, example Washington DC, the due date will be 2020-04-10.

philipxy
  • 14,867
  • 6
  • 39
  • 83
ruby.lee
  • 107
  • 8
  • "my team member insist this is the correct method" Interpersonal workplace issues are off topic on this site, there's a sister site for questions on workplace issues – Joni Jul 13 '20 at 12:55
  • 1
    From my point of view, Design B isn't even a contender – Strawberry Jul 13 '20 at 13:00
  • Design B is wrong it violates first normal form of normalization. – Sparky Jul 13 '20 at 13:01
  • This is a faq. Please before considering posting read your textbook and/or manual & google any error message or many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings/names & site:stackoverflow.com & tags; read many answers. If you post a question, use one phrasing as title. Reflect your research. See [ask] & the voting arrow mouseover texts. – philipxy Jul 13 '20 at 16:54
  • @philipxy, I did understand it's not good to use delimited list in database column, but my team member **VERY** confident on the **Design B**, in fact, what he said is **He had read a lot of codes and designs from countries like German, Japan, and etc, but he **NEVER** see codes or design like **Design A**. I'm asking here because his confident on **Design B** cause me confuse is it any new method/things had come in to database design that I didn't know.. – ruby.lee Jul 13 '20 at 17:13
  • Every use of a relational DB is A. (Assuming the left column has various holidays.) (Called various things depending on the design method: many to many association; association, join or junction table; etc.) A is the "1NF" form of B. One can replace strings by "codes" or "ids", but that's orthogonal.--Replacing values by other values that are somehow more id-ish is yet another faq. [Decision between storing lookup table id's or pure data](https://stackoverflow.com/q/383026/3404097) Follow a published academic textbook on information modelling, the relational model & DB design & querying. – philipxy Jul 13 '20 at 17:31

1 Answers1

2

The typical design for this would be something like:

Holidays
---------
ID   | Name        | ....
--------------------------
1    |Good Friday  |...
2    |Easter Monday|...
3    |Christmas Day|...
...

States
----------
ID   | Name 
---------------------------
1    | Connecticut
2    | Texas
3    | Vermont
....

State_Holiday
----------------
State_ID  | Holiday_ID
----------------------
1         | 1
2         | 1
3         | 2
.....

By using a table to define which holidays exist, and referring to them via a foreign key (holiday_id), you avoid typos - in design A, someone might enter "good friday" (not capitalization), and your application would be unclear about whether that's the same as "Good Friday".

Ditto for states - the state abbreviation codes may suffice as primary keys, but I'm not certain they are guaranteed unique (is there another country that uses these codes?)

The joining table State_Holiday makes it easy to find which holidays are observed in a given state, or which states observe a given holiday.

[Edit] Now you've outlined your sample use case, imagine executing that using design B.

You'd have to do a whole bunch of string parsing to find out whether a Connecticut has a holiday on a day that falls between date_begin and date_begin + N. You have to handle the delimiter, and the fact you cannot guarantee at a data model level that state codes are 2 characters. The comparison operator would almost certainly be something along the lines of and state like '*CT*', which would perform terribly because indexes don't work. You'd need application-level logic to make sure the state codes are correct - you can't enforce referential integrity in the data model.

Option A is much simpler - you're just comparing a bunch of foreign keys, rather doing string comparison. Your referential integrity is enforced by the data model, rather than application logic.

Option A is preferred by people used to working with relational databases; it's normalized, uses referential integrity and should perform well, no matter how many holidays you have.

Option B is a "NoSQL" approach. That's a whole different conversation, but I would not force a NoSQL design into a MySQL system.

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • Thanks @Neville, if we follow design A, the final product should be something like this. I just ignore some details and try to focus on the main question. – ruby.lee Jul 13 '20 at 15:28