1

Trying to figure out the best way to approach this. For the sake of simplicity, I'm going to make up some unreasonable situation for an example; I am a merchant with some crap data and I don't know the number of items people bought, so I have to back into it

  • I have three products, and three distinct prices that customers bought at ($250, $350, $500)
  • I have the total amount the customer paid (taxes / refunds etc not relevant for this example)

So if I know a customer paid $600 - it's likely they got one product at $250 and one product at $350. If a customer paid $1,000 then they either bought 2 products for $500, or 4 products for $250 or 3 products at $250*2 and $500 (thank you @dennis).

What I would like is something like:

customer_id total_amount_paid assumed_number_of_products assumed_prices
1241 1000 4 250, 250, 250, 250
1241 1000 2 500, 500
1241 1000 3 250, 250, 500
0912 600 2 250, 350

Any recommendations would help. Really stuck trying to approach this without overthinking it.

S31
  • 904
  • 2
  • 9
  • 21
  • 1
    Interesting challenge. I’ll give this more thought but want to point out that $250 + $350 is not $700. :). Also with $1000, there could be 2*250 + 1*500. With 700, they got 2*350 – Dennis Jul 20 '21 at 04:07
  • Oof my bad, brain is fried. Thanks for catching that - edited! – S31 Jul 20 '21 at 04:12
  • so your table only have two column `customer_id` and `total_amount_paid` ? You want know how many combination and use SQL query to list out all combination with your given `total_amount_paid`, am I right? – GGG Jul 20 '21 at 04:20
  • While this would lead to a lot of permutations and combinations as the number of products will increase, but the laziest / greediest solution would be to assume that the customer will always by the most expensive product, and then sort out the change using the cheaper products? – tsamridh86 Jul 20 '21 at 04:22
  • Approach: Generate all possible comination first per "basket" 3*4*4*4... (first is 3 because a person has to buy at least single product), calculate the total and compare with total_amount_paid. So [[A], [B], [C], [A,A],[A,B],[A,C],[B,B],[B,C],[C,C], [A,A,A], ... each basket has a value. The next step is choosing the "closest" or exact value which should not be an issue. For more sophisticated approach [MATCH_RECOGNIZE](https://docs.snowflake.com/en/sql-reference/constructs/match_recognize.html) and ["bin fitting"](https://stackoverflow.com/a/53994970/5070879) could be used – Lukasz Szozda Jul 20 '21 at 04:22
  • @GGG - exactly! That's the information I have. – S31 Jul 20 '21 at 04:30
  • @SamridhTuladhar - thanks for the recs! I have about 500K customers so was trying to avoid all the combinations, did try something along the lines of the latter statement with a case when statement but kept failing / my approach was likely too simplistic – S31 Jul 20 '21 at 04:40
  • 1
    @LukaszSzozda interesting! just so I'm on the same page, are you recommending to go backwards? i.e calculate all possible values of the combinations of 500 / 250 / 350 and then match that on to the original table with the total amount paid? – S31 Jul 20 '21 at 04:42
  • I know the revenue amounts won't go above > 10K, so maybe imposing that limit can also reduce the number of combos – S31 Jul 20 '21 at 04:43
  • @S31 Yes, that was my idea. Based on product list create first all possible "baskets" and then compare. So it is kind of backtracking :) – Lukasz Szozda Jul 20 '21 at 13:34

1 Answers1

2

Cool questions ...

Nice thing about Snowflake is we can do the brute force approach ... pre-calc all possible combo's then filter the duplicates using qualify and row_number with use of ascii.

To make it more fun I include returns (so qty can go from -10 to 10) and added a fourth product @ $135. I did limit the total distinct products to just 2 ... but this can be pushed to 3 easily.

select 
    cust_total,
    product, 
    second_product,
    quantity,
    second_quantity,
    ascii(product)+ascii(second_product) basket,
    quantity+second_quantity basket_quantity 
from     
    (select
    ap1.product,
    ap2.product second_product,
    ap1.quantity,
    ap2.quantity second_quantity,
    sum(ap1.revenue_combos) + sum(ap2.revenue_combos) cust_total
from
    all_possibles ap1,
    all_possibles ap2
where
    ap1.product<>ap2.product
group by
    1,2,3,4)
group by 
    1,2,3,4,5,6  
qualify
    1=  
    row_number() over 
        (partition by cust_total,basket,basket_quantity 
         order by     product,second_product)

enter image description here

Full copy|paste|run in SQL :

with all_possibles as (
select
    *
from
    (
        SELECT
            seq4() -10 quantity,
            quantity * prod_price_1 A,
            quantity * prod_price_2 B,
            quantity * prod_price_3 C,
            quantity * prod_price_4 D
        FROM
            TABLE (GENERATOR (ROWCOUNT => 21)),
            (
                select
                   /*UNIFORM (1, 500, random())*/250::int prod_price_1,
                   /*UNIFORM (1, 500, random())*/350::int prod_price_2,
                   /*UNIFORM (1, 500, random())*/500::int prod_price_3,
                   /*UNIFORM (1, 500, random())*/135::int prod_price_4
            )
    ) unpivot (
        revenue_combos for product in (
            A,
            B,
            C,
            D
        )
    )
 )

 select 
cust_total,
product, 
second_product,
quantity,
second_quantity,
ascii(product)+ascii(second_product) basket,
quantity+second_quantity basket_quantity 
 from     
 (select
ap1.product,
ap2.product second_product,
ap1.quantity,
ap2.quantity second_quantity,
sum(ap1.revenue_combos) + sum(ap2.revenue_combos) cust_total
 from
all_possibles ap1,
all_possibles ap2
 where
ap1.product<>ap2.product
 group by
1,2,3,4)
 group by 1,2,3,4,5,6  
 qualify
1=  row_number() over (partition by cust_total,basket,basket_quantity order by product,second_product )
Adrian White
  • 1,720
  • 12
  • 14