3

I am designing a relationship between mall and brands available at the mall.

The number of brands is fixed. However, it may increase in future. Now, a mall may feature some/all brands.

My question is:

Should I add the different brands as boolean attributes to the mall table like this:

mall
----------------------------------------------------------------------------
  mall_id  | mall_name | peter_england |   turle   |   adidas   |   puma   |
---------------------------------------------------------------------------- 
      1         xyz         false          true         true        false
      2         abc         true           false        true        true
----------------------------------------------------------------------------

OR

Create a N:M relationship between mall and a new brand table like this:

mall                             mall_brand                  brand
-------------------------        ---------------------       --------------------------
  mall_id  | mall_name |          mall_id  | brand_id         brand_id |   brand_name
-------------------------        ---------------------       --------------------------
      1         xyz                  1          2                 1      peter_england
      2         abc                  1          3                 2      turtle
-------------------------            2          1                 3      adidas
                                     2          3                 4      puma
                                     2          4            --------------------------
                                 ---------------------

I personally find the second one more flexible since it only stores true values. Besides, it also gives me the provision to add extra attributes to individual brands in future. Also, adding new brands is neat.

EDIT:

Also consider a situation where all of these brands exist for every mall and instead of storing boolean values, we're storing float values for average number of garments sold for each brand per day like this:

mall
----------------------------------------------------------------------------
  mall_id  | mall_name | peter_england |   turle   |   adidas   |   puma   |
---------------------------------------------------------------------------- 
      1         xyz          10.4          21.3         13.7        7.6
      2         abc          10.7          25.5         11.4        8.1
----------------------------------------------------------------------------

In this case should I store them as attributes, or create new tables similarly like before?

Nikunj Madhogaria
  • 2,139
  • 2
  • 23
  • 40
  • 2
    I would prefer the second properly normalized model almost always. It's much more flexible and easier to work with. – jpw Jun 27 '15 at 11:03
  • one more query: what if all brands are always `true` but store any other datatype, for example `puma_avg_shoe_price`, `adidas_avg_shoe_price`, `reebok_avg_shoe_price` and so on..still we go with the 2nd design? – Nikunj Madhogaria Jun 27 '15 at 11:26
  • Please ask one question per post, not 3. Please clarify via edits, not comments. – philipxy May 24 '19 at 17:47

2 Answers2

3

Updating answer after some misconception based on the discussion with @Gerrat

Of course the Second way is better.

The first table is in First Normal Form since none of its domains (field values) have any sets as elements (columns with multi-values). But the problem is with the repeated columns (brands that are repeating as columns in the table just to be marked in the mall), such a design is often considered an anti-pattern however because it constrains the table to a predetermined fixed number of values and because it forces queries and other business logic to be repeated for each of the columns. In other words it violates the "DRY" principle of design.

Always try to make your database, easy to maintain and normalized as much as possible.

References :

https://en.wikipedia.org/wiki/Database_normalization

https://en.wikipedia.org/wiki/First_normal_form

https://en.wikipedia.org/wiki/Second_normal_form

https://en.wikipedia.org/wiki/Third_normal_form

https://en.wikipedia.org/wiki/Boyce%E2%80%93Codd_normal_form

Normalization: What does "repeating groups" mean?

Community
  • 1
  • 1
KAD
  • 10,972
  • 4
  • 31
  • 73
  • thanks for mentioning 1st NF (creating separate tables for each group of related data). – Nikunj Madhogaria Jun 27 '15 at 11:12
  • one more query: what if all brands are always `true` but store any other datatype, for example `puma_avg_shoe_price`, `adidas_avg_shoe_price`, `reebok_avg_shoe_price` and so on..still we go with the 2nd design? – Nikunj Madhogaria Jun 27 '15 at 11:26
  • You would replace mall_brand with mall_brand_price, containing the ids of mall and brand, plus the price (or mall_brand_product_price ...) – Jens Schauder Jun 27 '15 at 11:28
  • @Jens, yes, that's fair, but what if all the brands are always existing. I'm considering a situation where all of these brands are always there in the mall. Wouldn't be it better to add them as an attribute in that case? – Nikunj Madhogaria Jun 27 '15 at 11:30
  • You can add a column to the table brand (avg_shoe_price) and set to it the average shoe prices. – KAD Jun 27 '15 at 11:30
  • what I mean to say is, in the last scenario we're having N:M relationship where M is **unknown**, in the new scenario, I'm having a N:M relationship where M is **constant**. – Nikunj Madhogaria Jun 27 '15 at 11:34
  • in other words, a situation where all the brand fields in my 1st design will always be `true`. – Nikunj Madhogaria Jun 27 '15 at 11:38
  • 1
    Well it's the same situation dude, these are still duplicate columns. In this case, how will you list the brand name?? Also, what if you get to have 100 brands in your mall, then your table will be 100 columns assuming you are only setting the float values, what if you want to add the name and location for each brand, this will add another 200 columns to your table. Having a separate table for the brand entity is the best solution and you can add all the attributes you need as columns to this table. – KAD Jun 27 '15 at 12:28
  • @KAD: Hist original table is actually in [First Normal Form](https://en.wikipedia.org/wiki/First_normal_form). It is a bad design, but still in 1NF. The link you provided on normalization rules is inaccurate, and poorly written. See [here](https://www.simple-talk.com/sql/learn-sql-server/facts-and-fallacies-about-first-normal-form/) for a better explanation of 1NF. – Gerrat Dec 13 '16 at 19:22
  • @KAD: The brands are not *related data*. Given the value of any of them tells you nothing about the value of any others. 1NF strictly addresses **multi-valued columns**. Read the 2nd sentence of the Wikipedia link I provided, and look at their example with phone numbers. Give the article a thorough read actually - nowhere does 1NF place any restriction on having multiple columns named similarly or hold similar information. The other article I linked is also a good reference, and specifically mentions this common misconception. **Read them** – Gerrat Dec 14 '16 at 20:16
  • @KAD: [Another good description here](http://stackoverflow.com/a/23202535/429982) – Gerrat Dec 14 '16 at 20:36
1

The answer is obviously: it depends.

In 99% of the cases the second version is the one to choose. It is properly normalized and therefore rather flexible. The way to go for an OLTP-System.

If you are working on a datawarehouse though, the first verion might be a better one, or even a third version, where you basically have the result of the join of the three tables in the second version as a table.

Jens Schauder
  • 77,657
  • 34
  • 181
  • 348
  • thank you for mentioning the cases where the two designs are applicable. – Nikunj Madhogaria Jun 27 '15 at 11:13
  • one more query: what if all brands are always `true` but store any other datatype, for example `puma_avg_shoe_price`, `adidas_avg_shoe_price`, `reebok_avg_shoe_price` and so on..still we go with the 2nd design? – Nikunj Madhogaria Jun 27 '15 at 11:26