Essentially I have what I call floating rows that come into my table. These are rows that have a type = mod
and they are associated with a customer_id
, but not another transaction_id
in the table. I need them associated with another transaction_id
and not just hanging out by themselves. So, I want to create a mapping table that gives the floating transaction_id
values and the previous_transaction_id
they are associated with. I'm using LAG to get the previous_transaction_id
and that works for some scenarios, but not all. Essentially, I'd like to tell LAG - "Hey LAG, if the value you found is one of these floating transaction_id
values, keep stepping back until you find one that isn't" But I don't know how to do that.
I'm using Snowflake and have jinja and dbt at my disposal if there is a better way to do this.
Here is some mock data with the various scenarios I have and my current query:
with data as (
select
transaction_id,
customer_id,
transaction_date,
amount,
type
from (values
(1, 'a','03/24/2022'::date, 10, 'cat'),
(1, 'a','03/24/2022'::date, 15, 'dog'),
(1, 'a','03/24/2022'::date, 20, 'mouse'),
(1, 'a','03/24/2022'::date, 30, 'rabbit'),
(1, 'a','03/24/2022'::date, 5, 'squirrel'),
(2, 'a','03/24/2022'::date, 4, 'mod'), -- floater
(3, 'b','05/20/2022'::date, 100, 'cat'),
(3, 'b','05/20/2022'::date, 150, 'dog'),
(3, 'b','05/20/2022'::date, 200, 'mouse'),
(3, 'b','05/20/2022'::date, 300, 'rabbit'),
(3, 'b','05/20/2022'::date, 50, 'squirrel'),
(4, 'b','07/20/2022'::date, 40, 'mod'), -- floater
(5, 'c','02/02/2020'::date, 100, 'cat'),
(5, 'c','02/02/2020'::date, 150, 'dog'),
(5, 'c','02/02/2020'::date, 200, 'mouse'),
(5, 'c','02/02/2020'::date, 300, 'rabbit'),
(6, 'c','08/01/2020'::date, 50, 'mod'), -- floater
(7, 'c','12/25/2020'::date, 40, 'mod'), -- floater
(8, 'd','01/15/2021'::date, 10, 'cat'),
(8, 'd','01/15/2021'::date, 15, 'dog'),
(8, 'd','01/15/2021'::date, 20, 'mouse'),
(8, 'd','01/15/2021'::date, 30, 'rabbit'),
(8, 'd','01/15/2021'::date, 5, 'squirrel'),
(8, 'd','01/15/2021'::date, 4, 'mod'),
(9, 'e','02/10/2020'::date, 100, 'cat'),
(9, 'e','02/10/2020'::date, 150, 'dog'),
(9, 'e','02/10/2020'::date, 200, 'mouse'),
(9, 'e','02/10/2020'::date, 300, 'rabbit'),
(10, 'e','08/17/2020'::date, 50, 'mod'), -- floater
(11, 'e','12/15/2020'::date, 40, 'mod'), -- floater
(12, 'e','02/14/2021'::date, 40, 'mod'), -- floater
(13, 'c','04/09/2022'::date, 0, 'mouse'),
(13, 'c','04/09/2022'::date, 0, 'rabbit'),
(13, 'c','04/09/2022'::date, 50, 'mod') -- floater because other values for transaction_id sum to 0
) as tbl (transaction_id, customer_id, transaction_date, amount, type)
),
previous_transaction_id as (
select
transaction_id,
customer_id,
lag(transaction_id, 1, null) over (partition by customer_id order by transaction_date) as previous_transaction_id
from data
qualify transaction_id != previous_transaction_id
),
floating_mods as (
select
transaction_id,
sum(iff(type = 'mod', amount, 0)) as mod_amount,
sum(amount) - mod_amount as non_mod_amount
from data
group by 1
having non_mod_amount = 0
)
select
gp.transaction_id,
gp.previous_transaction_id
from previous_transaction_id gp
inner join floating_mods fm on gp.transaction_id = fm.transaction_id
order by gp.transaction_id
And here is the output of the query:
TRANSACTION_ID | PREVIOUS_TRANSACTION_ID |
---|---|
2 | 1 |
4 | 3 |
6 | 5 |
7 | 6 |
10 | 9 |
11 | 10 |
12 | 11 |
13 | 7 |
And here is my desired output:
TRANSACTION_ID | PREVIOUS_TRANSACTION_ID |
---|---|
2 | 1 |
4 | 3 |
6 | 5 |
7 | 5 |
10 | 9 |
11 | 9 |
12 | 9 |
13 | 5 |