4

Does anyone know a method to use to get a rough size of an OLAP cube based on a star schema data warehouse. Something based on the number of dimensions, the number of records in the dimension tables and the number of fact records and finally the number of aggregations or distinct records etc..

The database I am looking at has a fact table of over 20 billion rows and a few dimension tables of 20 million, 70 million and 1.3 billion rows.

Thanks Nicholas

NER1808
  • 1,829
  • 2
  • 33
  • 45
  • Do you mean size in terms of records, or size in terms of disk-space? – MatBailie Jun 20 '11 at 15:45
  • With this size, I'm curious to know the soft / hardware you'll use (it's just huge a cube with a dimension of 1 billions rows) – ic3 Jun 23 '11 at 21:14
  • At the moment we are using a very fast in-memory database system called WX2 by Kognitio (http://www.kognitio.com/wx2). I am investigating the possibility of pre-processing some data to reduce the load on this very expensive system. Disk is cheaper than memory!! – NER1808 Sep 01 '11 at 16:16

3 Answers3

3

I can see some roadblocks to creating this estimate. Knowing the row counts and cardinalities of the dimension tables in isolation isn't nearly as important as the relationships between them.

Imagine two low-cardinality dimensions with n and m unique values respectively. Caching OLAP aggregates over those dimensions produces anywhere from n + m values to n * m values depending on how closely the relationship resembles a pure bijection. Given only the information you provided, all you can say is you'll end up with fewer than 3.64 * 10^34 values, which is not very useful.

I'm pessimistic there's an algorithm fast enough that it wouldn't make more sense to generate the cube and weigh it when you're done.

goodside
  • 4,429
  • 2
  • 22
  • 32
2

We wrote a research paper that seems relevant:

Kamel Aouiche and Daniel Lemire, A Comparison of Five Probabilistic View-Size Estimation Techniques in OLAP, DOLAP 2007, pp. 17-24, 2007. http://arxiv.org/abs/cs.DB/0703058

Daniel Lemire
  • 3,470
  • 2
  • 25
  • 23
  • Very interesting. I shall have a read. Did you create any software tools for pointing at source data to get estimates? – NER1808 Dec 06 '12 at 11:36
  • We have published some C++ software (http://code.google.com/p/viewsizeestimation/), but it can't run over an existing DBMS. – Daniel Lemire Dec 12 '12 at 05:31
0

Well. You can use a general rule of Analysis Services data being about 1/4 - 1/3 size of the same data stored in relational database.

Edward.

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/6b16d2b2-2913-4714-a21d-07ff91688d11/cube-size-estimation-formula

Pawan Kumar
  • 1,991
  • 10
  • 12
  • This is close to being "link-only answer" - could you post example code snippet/paragraph describing the solution from linked page? – bardzusny May 28 '15 at 10:01