I want to build a simple multidimensional data model by using the star schema in a relational database (ROLAP). For that I create a fact table and two tables of dimensions. Firstly I copy the data from the operational source and handle this data (some simplified ETL process).
In my model only two dimensions: date
and status
. Measure: the number of certain statuses (for a time).
The time dimension table:
CREATE TABLE [dbo].[tbl_date_dim] (
[ID][int] IDENTITY(1,1) NOT NULL,
[date_key][int] NOT NULL primary key,
[Year][int] NOT NULL,
[Month][int] NOT NULL,
[Day][int] NOT NULL
);
There is a table - tbl_application
- in which is stored the whole time range (field VersionDate
). Therefore, the time dimension table I'm filling this way:
INSERT INTO [dbo].[tbl_date_dim]
([date_key],
[Year],
[Month],
[Day])
(
SELECT DISTINCT
CAST(YEAR(VersionDate) as VARCHAR(4)) +
RIGHT('00' + CAST(MONTH(VersionDate) as VARCHAR(2)) ,2) +
RIGHT('00' + CAST(DAY(VersionDate) as VARCHAR(2)), 2) as 'date_key',
YEAR(inner_data.VersionDate) as 'Year',
MONTH(inner_data.VersionDate) as 'Month',
DAY(inner_data.VersionDate) as 'Day'
FROM (
SELECT
VersionDate
FROM [dbo].[tbl_application]
) AS inner_data
);
The status dimension table: I use whole existing table tbl_applicationstatus
.
Next, I create a fact table. It contains foreign keys to dimension tables and measures.
CREATE TABLE [dbo].[tbl_olap_fact] (
[ID][int] IDENTITY(1,1) NOT NULL,
[status_id][int] NOT NULL, // FK
[date_dim][int] NOT NULL, // FK
[staus_name] varchar(100) NOT NULL, // Non additive measure
[transaction_id][int] NOT NULL, // Additive measure
CONSTRAINT [PK_tbl_olap_fact] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
transaction_id
- this field, which I will aggregate (number of statuses).
Next, I add the relationship between the fact table and dimension tables:
ALTER TABLE [dbo].[tbl_olap_fact] ADD CONSTRAINT [FK_tbl_olap_fact_tbl_date_dim] FOREIGN KEY([date_dim])
REFERENCES [dbo].[tbl_date_dim] ([date_key]);
ALTER TABLE [dbo].[tbl_olap_fact] ADD CONSTRAINT [FK_tbl_olap_fact_tbl_applicationstatus] FOREIGN KEY([status_id])
REFERENCES [dbo].[tbl_applicationstatus] ([ID]);
Then I fill the fact table:
INSERT INTO [dbo].[tbl_olap_fact]
([transaction_id],
[status_id],
[staus_name],
[date_dim])
(
SELECT DISTINCT
core.id as 'transaction_id',
core_status.ID as 'status_id',
core_status.name as 'status_name',
CAST(YEAR(core.VersionDate) as VARCHAR(4)) +
RIGHT('00' + CAST(MONTH(core.VersionDate) as VARCHAR(2)) ,2) +
RIGHT('00' + CAST(DAY(core.VersionDate) as VARCHAR(2)), 2) as 'date_dim'
FROM
[dbo].[tbl_application] as core
inner join tbl_applicationstatus as core_status
on core.ApplicationStatusID = core_status.ID
WHERE IsRaw = 0
);
As the OLAP server I'm using Mondrian. Mondrian schema that defines the logical model of the multidimensional database:
<Schema name="olap_schema">
<Dimension type="TimeDimension" visible="true" highCardinality="false" name="Date first dim">
<Hierarchy name="date_hierarchy" visible="true" hasAll="true" primaryKey="date_key" description="">
<Table name="tbl_date_dim" schema="dbo">
</Table>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Year"
nameColumn="Year"
type="Numeric"
uniqueMembers="true"
levelType="TimeYears"
hideMemberIf="Never"
description="">
</Level>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Month"
nameColumn="Month"
ordinalColumn="Month"
type="Numeric"
uniqueMembers="false"
levelType="TimeMonths"
hideMemberIf="Never"
description="">
</Level>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Day"
nameColumn="Day"
ordinalColumn="Day"
type="Numeric"
uniqueMembers="false"
levelType="TimeDays"
hideMemberIf="Never"
description="">
</Level>
</Hierarchy>
</Dimension>
<Dimension type="TimeDimension" visible="true" highCardinality="false" name="Date second dim">
<Hierarchy name="date_hierarchy" visible="true" hasAll="true" primaryKey="date_key" description="">
<Table name="tbl_date_dim" schema="dbo">
</Table>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Year"
nameColumn="Year"
type="Numeric"
uniqueMembers="true"
levelType="TimeYears"
hideMemberIf="Never"
description="">
</Level>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Month"
nameColumn="Month"
ordinalColumn="Month"
type="Numeric"
uniqueMembers="false"
levelType="TimeMonths"
hideMemberIf="Never"
description="">
</Level>
<Level name=""
visible="true"
table="tbl_date_dim"
column="Day"
nameColumn="Day"
ordinalColumn="Day"
type="Numeric"
uniqueMembers="false"
levelType="TimeDays"
hideMemberIf="Never"
description="">
</Level>
</Hierarchy>
</Dimension>
<Dimension type="StandardDimension" visible="true" highCardinality="false" name="Status dimension">
<Hierarchy name="status_hierarchy" visible="true" hasAll="true" primaryKey="ID" description="">
<Table name="tbl_applicationstatus" schema="dbo">
</Table>
<Level name=""
visible="true"
table="tbl_applicationstatus"
column="Name"
nameColumn="Name"
type="String"
uniqueMembers="true"
levelType="Regular"
hideMemberIf="Never"
description="">
</Level>
</Hierarchy>
</Dimension>
<Cube name="enrollment_cube" caption="" visible="true" description="" cache="true" enabled="true">
<Table name="tbl_olap_fact" schema="dbo">
</Table>
<DimensionUsage source="Date first dim" name="X axis" caption="" visible="true" foreignKey="date_dim" highCardinality="false">
</DimensionUsage>
<DimensionUsage source="Date second dim" name="Y axis" caption="" visible="true" foreignKey="date_dim" highCardinality="false">
</DimensionUsage>
<DimensionUsage source="Status dimension" name="Z axis" caption="" visible="true" foreignKey="status_id" highCardinality="false">
</DimensionUsage>
<Measure name="TotalCount" column="transaction_id" aggregator="count" caption="Total" visible="true">
</Measure>
</Cube>
</Schema>
As the OLAP client I'm using Saiku Analytics.
Basically, I get the correct data - but not quite sure in it. For example, does the correct that way that I use to populate the fact table? Am I building ETL process properly? This is a test mode and I make some experiments in building data warehouses and multidimensional models.