0

I have developed a data warehouse that follows a star schema. Within this schema I have multiple dimension tables connected to my fact table. However, I have a bridge table between one of these dimensions and the fact table.

i.e. example of my warehouse schema

Product <- ProductGroup <- Fact

Fact table contains: ProductGroup_SK...
ProductGroup table contains: ProductGroup_SK, Product_SK
Product table contains: Product_SK, product_fields...

I will be using Oracle Data Miner to carry out analysis on my data warehouse, and have been trying to determine how to return one row for each fact, containing the product details in the product table, based on the ProductGroup_SK which connects the Fact table to the ProductGroup bridge table.

So far I've been able to join and return one row successfully, however this only works if there are only two products in the productGroup. Also, if only one product exists in the group, this query will return the same product twice as I specify two joins in the query itself using MIN and MAX.

Select * from Fact f
join Product p ON p.Product_SK = (SELECT MIN(Product_SK) FROM ProductGroup pg WHERE pg.ProductGroup_SK = f.ProductGroup_SK)
join Product p ON p.Product_SK = (SELECT MAX(Product_SK) FROM ProductGroup pg WHERE pg.ProductGroup_SK = f.ProductGroup_SK);

I have read that a PIVOT query should be used here, but I do not need to perform any aggregation, and the number of rows in each productGroup will not be the same for each row in the fact table i.e. some groups may have only 1 product, some may have 6.

Basically I was hoping to figure out a way to get the same result as I get by this hack way I've done above, i.e the result instead of looking like this:

... ProductGroup_SK   Product_SK   Product_Name   Product_Category .. etc
           123            1           Apple           Fruit
           123            2           Banana          Fruit

Should look like this:

... ProductGroup_SK   Product_SK   Product_Name   Product_Category   Product_SK_1   Product_Name_1   Product_Category_1 .. Product_SK_N   Product_Name_N   Product_Category_N
           123            1           Apple           Fruit                 2           Banana          Fruit               ...........      ..........      ..........

The query I demonstrated above produces this desired result, but as you can see it is not good code and is impractical for productGroups that contain less than or more than two products.

Any suggestions would be greatly appreciated, I've been trying to figure out how to do this properly but could not find any good examples of reading data via a bridge table in one row. Is this possible?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
CuriousDog
  • 35
  • 8
  • With this design, you will never to be able to see sales/orders/whatever-the-fact-table-measures at the product level. Is that the intention? Example: Fact has 1 sale against group 1. There are 3 products in the group: apple, banana and strawberry. How many strawberries did you sell? – David Rushton Jan 24 '18 at 11:24
  • I know what you're saying, however within the product table I will have product-related fields like category, producer etc. it will not affect the measures in the fact table, it is purely to obtain info about each product (not sales for each individual one) Is that possible with my design? I do not need specific measures from the fact table for each product, I simply wish to expand the items in each productGroup into the one query result after i join to my other various dimensions for example, and want all products to be displayed in the one query result row. – CuriousDog Jan 24 '18 at 11:39
  • Is there a fixed upper limit? Or could a group contain any number of products? – David Rushton Jan 24 '18 at 11:55

2 Answers2

0

I think you just need different table aliases:

Select f.*, pmin.*, pmax.*
from Fact f join
     Product pmin
     on pmin.Product_SK = (SELECT MIN(Product_SK)
                           FROM ProductGroup pg
                           WHERE pg.ProductGroup_SK = f.ProductGroup_SK
                          ) join
     Product pmax
     ON pmax.Product_SK = (SELECT MAX(Product_SK)
                           FROM ProductGroup pg
                           WHERE pg.ProductGroup_SK = f.ProductGroup_SK
                          );

This is definitely not how I would write the query, though. The subqueries in the FROM clause are just confusing. Instead:

Select f.*, pmin.*, pmax.*
from Fact f join
     (select pg.ProductGroup_SK, max(pg.Product_SK) as maxp, min(pg.Product_SK) as minp
      from ProductGroup pg
      group by pg.ProductGroup_SK
     ) pg
     on pg.ProductGroup_SK = f.ProductGroup_SK join
     Product pmin
     on pmin.Product_SK = pg.minp join 
     Product pmax
     on pmax.Product_SK = pg.maxp;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you for the reply! However the way I was using min and max was just a hack, this will not work for a product group that has for example only 1 product in it, or 6 products as this will only display 2 products, would it not? Sorry, my question was probably very confusing.. – CuriousDog Jan 23 '18 at 22:09
0

Given that the product names or the product description is required to provide context for the product group when joined with the fact then perhaps this method could work.

I used this as reference SQL Query to concatenate column values from multiple rows in Oracle to produce the following query.

/* Step 1: Join Product & Product Group table 
   Step 2: Concatenate Product Names grouped by Product Group on the table retuned from step1.  The Grain of this result table should be same as the product_group dimension.
   Step 3: Joined the result table from step 2 with the fact table */



-- Step 3
Select * 
from fact, 
     -- Step 2:
     (      
    Select  
            product_group_sk,
            LISTAGG(product_name, ', ') WITHIN GROUP (ORDER BY product_group_sk) AS product_name                
        from 
         -- Step 1
         ( 
            Select product_group_sk, 
                   Product_sk, 
                   product_name                
            from product_group pg, product p
            where pg.product_sk = p.product_sk) product_group_list
        group by product_group_list.product_group_sk
    ) product_group_pivoted_list
Where fact.product_group_sk = product_group_pivoted_list.product_group_sk;