2

I am in the process of designing a new customer fact and dimensions for my warehouse. In my search for good example models I noticed something odd. No one seems to have a customer-centric fact. Every example I found has a transaction event such as a sale or order as the central fact with customers as a dimension. This raises a question for me.

Am I doing something seriously wrong by wanting a customer fact? The goal is to enable analysis of customer behavior such as order frequency, total spend, acquisition cost, distinct, product count,...etc These questions naturally imply a fact to me not a dimension. I already have an order fact that is great for order-centric queries but is not good for customer-centric queries.

To give you a little more detail the Customer Fact will likely have the following measures and dimensions:

Measures:

  • Count of Customers
  • Distinct Product Count
  • Completed Order Count
  • Total Revenue
  • Total Cost
  • Count of Coupons Received
  • Count of Coupons Redeemed
  • Cost of Coupons Redeemed

Dimensions:

  • Order Delivery Date
  • Order Delivery Time
  • Order Delivery Geography
  • Acquisition Source
  • Order Type
  • Coupon Type

The above seems pretty natural to me but I am concerned I am missing an obvious no-no by taking a customer-centric approach in this new cube.

sleske
  • 81,358
  • 34
  • 189
  • 227
Kevin D. White
  • 1,213
  • 11
  • 11
  • One other item that neglected to mention is the number of many-to-many relations required to pull this off. For example, I will have dimensions for order date and order geography. Both of these must be many-to-many since each customer can be place multiple orders and each order can be delivered to a different location. In fact nearly all of my dimensions will be many-to-many. Does this change your assessment of the wisdom of my approach? – Kevin D. White Jul 19 '10 at 04:16
  • @Kevin A fact can only be linked to one dimension. If a customer has multiple orders, the "orders" cannot be a dimension of the customer. It is pointless to have a bridge table from customer to orders - for that one really must link the two stars (customer and orders) on their business keys. Ditto with geography. It is not possible to ask about the "multi-geography" of a customer. Number of locations and numbers of orders and average order size are all things which could be asked about a customer. Looking at your proposed dimensions - most of these are meaningless for a customer. – Cade Roux Jul 19 '10 at 12:10
  • @Kevin For example: What is the Geography of the Product Counts of customer C from 1/1/2010? That makes no sense to ask that question. The customer sill have that fact, but the geography is not a valid dimension of the fact. – Cade Roux Jul 19 '10 at 12:12
  • @Cade You may have a design bias against n:n relationships in a warehouse but they can and do exist. There are well established methods for composing n:n relationships in a warehouse. You should check them out. – Kevin D. White Jul 23 '10 at 13:59
  • @Kevin D. White But that's not really a customer fact table, that's a bridge, link or factless fact table which uses a customer dimension and a geography dimension for multiple instances of the "fact" of linkage. And one still cannot answer the question what is the geography for the customer, one can only answer what are the geographies for the customer. If the product counts and other real facts are added on a per-geography basis, you further expand away from this being a "customer" fact table, since there are multiple rows per customer, so it cannot be any "more" customer-centric. – Cade Roux Jul 23 '10 at 14:57
  • @Kevin you've also written "cube" in your last sentence. A cube isn't part of an RDB schema. Did you mean to write schema? – shmichael Jul 27 '10 at 09:06
  • @KevinD.White, we already have a customer dimension table, which has the `salesman`, `acquisition_channel`, `acquisition_date` field, we will need to build KPI to monitor the performance of each `salesman` and `acquisition_channel`, could we just use the dimension table to accomplish this ? – mingchau Apr 24 '19 at 08:27

4 Answers4

4

We have customer facts. A lot of the time they are factless fact tables which just link several dimensions.

Sounds like a lot of your facts are derived or summary. Grain will still be important. If you say order count is that MTD (and what date) or for all time, etc.

I don't think there is anything wrong with that, but I think because this is derived data, most people would put it in a "data mart" or whatever the best unambiguous term for a subset for analysis.

I agree that modeling it the same way is perfectly valid. The only thing you have to watch out for is the same with all derived data, it needs to remain consistent.

Your customer will have a dimension (conformed, since it is shared between models) and then a CustomerStats fact table or whatever, with every fact at that grain which shares all those dimensions.

Cade Roux
  • 88,164
  • 40
  • 182
  • 265
  • You make some good really good points here. Especially about factless fact tables. To be clear however that is not what I am proposing to build. – Kevin D. White Jul 23 '10 at 14:00
1

The reason so many systems are order-centric instead of customer-centric is that how you identify the customer changes so frequently over time: previously treating a business as the customer evolves into treating individual employees as the customers or vice versa, or a customer will change/split/merge addresses, or a business changes its name and we want to consolidate (or segregate) old and new performance totals, or a shipping address and billing address now have to be expanded to include a support address, or operators forget or mistake one address purpose for another, or a customer wants to use a special shipping address only temporarily, or etc etc.

This is addressed in more detail here.

Community
  • 1
  • 1
joe snyder
  • 3,629
  • 2
  • 21
  • 13
0

I may have misunderstood your question, but let's see what can be learned about customer behaviour from the factOrder only, the old-fashioned way.

Assuming that grain of the factOrder is one line on the order and that there is OrderID as a degenerate dimension.

-- Number of customers who ordered something at least once
select
    count(distinct CustomerKey) as PayingCustomers
from factOrder ;

.

-- Number of orders and sales per customer
select 
      CustomerKey
    , count(distinct OrderID) as NumberOfOrders
    , sum(ExtendedPrice)      as Total
from factOrder
group by CustomerKey ;

.

-- Histogram (x = NumberOfOrders, y = People, Amount)
with
orders_per_customer as (
    select 
      CustomerKey
    , count(distinct OrderID) as cnt
    , sum(ExtendedPrice)      as Total
    from factOrder
    group by CustomerKey
)
select
      cnt        as NumberOfOrders
    , count(1)   as People
    , sum(Total) as Amount
from orders_per_customer
group by cnt
order by cnt asc ;

.

-- Distinct products ordered by customer
select 
      CustomerKey
    , count(distinct ProductKey) as DistinctProductsOrdered
from factOrder
group by CustomerKey ;

.

-- Histogram (x = NumberOfDistinctProducts, y = People)
with
products_per_customer as (
    select 
      CustomerKey
    , count(distinct ProductKey)  as cnt
    from factOrder
    group by CustomerKey
)
select
      cnt       as NumberOfDistinctProducts
    , count(1)  as People
from products_per_customer
group by cnt
order by cnt asc ;
Damir Sudarevic
  • 21,891
  • 3
  • 47
  • 71
  • My business users interact with the cube not the warehouse. The cube interaction is mediated by either Excel or Cognos. While I can perform such queries quite easily in either SQL or MDX my business users cannot. This is another reason I seem to be driving toward a customer-cube as an addition to my order cube. – Kevin D. White Jul 19 '10 at 04:19
0

Measures such as order frequency, total spend, acquisition cost, distinct product count are actually derived from Orders as a fact table, with Customers as a dimension. Aggregating per customer could just as easily been aggregating per product or per geographic location.

As Cade Roux has suggested, you can build a customers aggregated table, which should be detached from the other fact tables, however that would purely be a performance decision. You retain maximum flexibility by constructing Customers as a dimension of Orders.

shmichael
  • 2,985
  • 3
  • 25
  • 32
  • I do already have a order fact with multiple dimensions in use. One of the dimensions is customers. However the problem is many of the questions my business users are trying to answer around behavior are extremely difficult to answer with the existing order fact and dimensions. I could change the design but that would have additional downstream impact on my users. Part of the driver for this new customer fact is so that I can leave my existing fact in place and minimize the impact to my users. – Kevin D. White Jul 23 '10 at 14:04
  • Is it extremely difficult due to performance issues or due to schema limitations? If it is schema limitations - could you provide an example or two? – shmichael Jul 27 '10 at 09:04