I am new to dimensional modeling and have read a lot of material (star-schema, dimension/fact tables, SCD, Ralph Kimball's - The Data Warehouse Toolkit book, etc). So I have a good conceptual understanding of dimensional modeling constructs but finding it hard to apply to a usecase due to lack of experience and need some guidance.
Consider Twitter for example, I want to design a dimensional model to calculate -
- DAU (Daily active users) = number of users who logged in and accessed twitter via website or mobile app on a given day
- MAU (Monthly active users) = number of users who logged in and accessed twitter via website or mobile app in last 30 days including measurement date
- User engagement = total(clicks + favorites + replies + retweets) on a tweet
These metrics over a period (like month) is the summation of these metrics on each day in that period.
I want to write SQLs to calculate these metrics for every quarter by region (eg: US and rest of world) and calculate year-over-year growth (or decline) in these metrics.
Eg:
Here are some details that I thought about -
Factless (transaction) fact table for user login activity with grain of 1 row per user per login : user_login_fact_schema (user_dim_key, date_dim_key, user_location_dim_key, access_method_dim_key)
Factless (transaction) fact table for user activity with grain of 1 row per user per activity : user_activity_fact_schema (user_dim_key, date_dim_key, user_location_dim_key, access_method_dim_key, post_key, activity_type_key)
Does this sounds correct? How should my model look like? What other dimensions/facts can I add here?
Wonder if I should collapse these 2 tables into 1 and have activity_type for logins as 'login', but there can be a huge number of logins without any activity so this will skew the data. Am I missing anything else?