Or, can dimensions be, in some way nested, in data.cube?
Given the following example (accessed via ?data.cube
on R, having installed last branch of data.cube-oop package, by @jangorecki) for which I post code and image example.
Consider I want to expand the cube adding a new dimension which would turn the schema to snowflake, so for each geography location, I would have another set of data (data.table) which would describe demography properties (i.e. population based on gender, age, etc)
Image
dotted: possible new dimensions.
black: actual facts and dimensions from code example.
green: new dimension which turns the schema into snowflake.
Code
# multidimensional hierarchical data from fact and dimensions
X = populate_star(N = 1e3)
sales = X$fact$sales
time = X$dims$time
geography = X$dims$geography
# define hierarchies
time.hierarchies = list( # 2 hierarchies in time dimension
"monthly" = list(
"time_year" = character(),
"time_quarter" = c("time_quarter_name"),
"time_month" = c("time_month_name"),
"time_date" = c("time_month","time_quarter","time_year")
),
"weekly" = list(
"time_year" = character(),
"time_week" = character(),
"time_date" = c("time_week","time_year")
)
)
geog.hierarchies = list( # 1 hierarchy in geography dimension
list(
"geog_region_name" = character(),
"geog_division_name" = c("geog_region_name"),
"geog_abb" = c("geog_name","geog_division_name","geog_region_name")
)
)
# create dimensions
dims = list(
time = as.dimension(x = time,
id.vars = "time_date",
hierarchies = time.hierarchies),
geography = as.dimension(x = geography,
id.vars = "geog_abb",
hierarchies = geog.hierarchies)
)
# create fact
ff = as.fact(
x = sales,
id.vars = c("geog_abb","time_date"),
measure.vars = c("amount","value"),
fun.aggregate = sum,
na.rm = TRUE
)
# create data.cube
dc = as.data.cube(ff, dims)
str(dc)
other questions related to the example are:
what is the value expected for each element? why
"time_week" = character()
"time_date" = c("time_week","time_year")
instead of
"time_week" = character()
"time_date" = date()
and why this naming as in columns of data.table?
"time_quarter" = c("time_quarter_name"),
"time_month" = c("time_month_name")