I have transaction data table which is in user level, where user buy different products day to day. I want to aggregate the total transaction count for per user by pivoting the columns with same prefix. meaning, per user, total time of transaction for pivoted columns. it should be conditioned on date as well.
reproducible data and attempt
here is reproducible data schema:
CREATE OR REPLACE TABLE transaction_tble (
user_id char(10),
transaction_date, date,
catg_prod1_amt int(10),
catg_prod2_amt int(10),
catg_prod3_amt int(10),
sku_item1_qty int(10),
sku_item2_qty int(10),
sku_item3_qty int(10)
)
here is the value of the table:
here is the sample data:
user_id,transaction_date,sku_sale_prod1,sku_sale_prod2,sku_sale_prod3,catg_purch_prod3,catg_purch_prod4,catg_purch_prod5
A,1/14/2022,10,0,0,0,11,0
B,1/10/2022,0,18,0,5,11,7
A,1/19/2022,6,18,2,5,0,0
A,1/19/2022,10,18,1,5,11,7
B,1/19/2022,10,18,1,5,11,7
C,1/20/2022,10,18,1,5,11,7
C,1/20/2022,10,18,1,5,11,7
B,4/19/2022,10,18,1,5,11,7
A,5/23/2022,10,18,1,5,11,7
A,7/23/2022,10,18,1,5,11,7
my attempt
I think using pivot operation may be right approach but it doesn't work
CREATE OR REPLACE TABLE trans_view (
select
user_id,
( case when transaction_date between '2022-01-01' - INTERVAL 6 MONTH then count(distinct transaction_date) end) as 2022_h1_trans_cnt,
( case when transaction_date between '2022-06-01' - INTERVAL 6 MONTH then count(distinct transaction_date) end) as 2022_h2_trans_cnt,
-- using pivot here
( case
when transaction_date between '2022-01-01' - INTERVAL 6 MONTH
and
PIVOT (
count(distinct transaction_date)
for sku in (
'sku_sale_prod1','sku_sale_prod2','sku_sale_prod3'
)
)
then count(distinct transaction_date)
end
) as 2022_h1_trans_cnt_sku,
--
( case
when transaction_date between '2022-01-01' - INTERVAL 6 MONTH
and
PIVOT (
count(distinct transaction_date)
for catg in (
'catg_purch_prod3','catg_purch_prod4','catg_purch_prod5'
)
)
then count(distinct transaction_date)
end
) as 2022_h1_trans_cnt_catg
from transaction_table
group by user_id
)
I looked into pivot function in big query, not sure what went wrong with my logic. Can anyone suggest possible workaround to do this?
desired output
here is my desired output table schema:
CREATE OR REPLACE VIEW transc_view as(
select
user_id,
2022_h1_trans_cnt,
2022_h2_trans_cnt,
2022_h1_trans_cnt_catg,
2022_h2_trans_cnt_catg,
2022_h1_trans_cnt_sku,
2022_h2_trans_cnt_sku
from transaction_tble
group by user_id
)
here is desired value of the output table:
user_id,2022_1st_6mth_trns_cnt,2022_2nd_6mth_trns_cnt,2022_1st_6mth_trns_cnt_sku,2022_2nd_6mth_trns_cnt_sku,2022_1st_6mth_trns_cnt_catg,2022_2nd_6mth_trns_cnt_catg
A,4,1,3,,3,1
B,3,0,3,0,3,0
C,1,1,1,0,1,0
How can I correctly pivot the table and get distinct transaction date count for this? any thoughts or quick workaround on this?