5

Here is the problem, I have a sales information table which contains sales information, which has columns like (Primary Key ID, Product Name, Product ID, Store Name, Store ID, Sales Date). I want to do analysis like drill up and drill down on store/product/sales date.

There are two design options I am thinking about,

  1. Create individual index on columns like product name, product ID, Store Name, Store ID, Sales Date;
  2. Using data warehouse snowflake model, treating current sales information table as fact table, and create product, store, and sales date dimension table.

In order to have better analysis performance, I heard snowflake model is better. But why it is better than index on related columns from database design perspective?

thanks in advance, Lin

Lin Ma
  • 9,739
  • 32
  • 105
  • 175

1 Answers1

3

Knowing your app usage patterns and what you want to optimize for are important. Here are a few reasons (among many) to choose one over the other.

Normalized Snowflake PROs:

Faster queries and lower disk and memory requirements. Due to each normalized row having only short keys rather than longer text fields, your primary fact table becomes much smaller. Even when an index is used (unless the query can be answered directly by the index itself), partial table scans are often required, and smaller data means fewer disk reads and faster access.

Easier modifications and better data integrity. Say a store changes its name. In snowflake, you change one row, whereas in a large denormalized table, you have to change it every time it comes up, and you will often end up with spelling errors and multiple variations of the same name.

Denormalized Wide Table PROs:

Faster single record loads. When you most often load just a single record or small number of records, having all your data together in one row will incur only a single cache miss or disk read, whereas in the snowflake the DB might have to read from multiple tables in different disk locations. This is more like how NoSQL databases store their "objects" associated with a key.

dkamins
  • 21,450
  • 7
  • 55
  • 59
  • Thanks dkamins. Why you call the index bases solution (option 1 in my question) denormalized? Appreciate if you could show an example or elaborate a bit more. – Lin Ma Jan 13 '13 at 04:44
  • Another question is, for your comments, "When you most often load just a single record or small number of records", if the number of record to process is not small, why snowflake is better performance than index based solution? – Lin Ma Jan 13 '13 at 04:54
  • 1
    @LinMa, read more about http://en.wikipedia.org/wiki/Database_normalization . Also it's not exactly that snowflake is better performance for many records -- it's that denormalized is faster for single record load. – dkamins Jan 13 '13 at 07:59
  • Hi dkamins, thanks for the good reference. I stil have a confusion which is in what situations, performance of snowflake is better than index based solution? – Lin Ma Jan 13 '13 at 14:58
  • 1
    @LinMa, when searching for data, in snowflake the associated dimension tables can often be loaded into memory entirely (due to normalization / lack of duplication), so as the DB reads the primary table, it can read it off disk quicker and move through rows quicker due to only needing to read keys instead of reading the same data over and over. – dkamins Jan 14 '13 at 19:08
  • Thanks, @dkamins. Two more questions, (1) for your comments, "dimension tables can often be loaded into memory entirely", is it because of dimension tables are often small? (2) for your comments, "instead of reading the same data over and over", do you mean we only need to read data like store ID in primary table in snow-flake mode other than read the whole row when doing drill-up/down using store ID? My confusion is, in option 1 simple index solution, I think we can still only read necessary values like store ID, so what is the saving here? Please feel free to correct me if I am wrong. – Lin Ma Jan 20 '13 at 14:39
  • 1
    @LinMa, Yes and yes -- you seem to understand the issues. However you may be giving an index more credit than it deserves. Typically an index contains only the actual data that is indexed and a pointer to the row. So (depending on your query) you will most often still end up reading actual rows. – dkamins Jan 21 '13 at 01:50
  • Thanks, @ dkamins. If I need to drill up and drill down on store/product/sales date, for snow-flake model, what columns do you think should be in fact table? – Lin Ma Jan 21 '13 at 17:48