0

So, I am trying to recreate a model in SQL that was developed in Excel and I am having a tough time because the model is reliant upon the position of certain records.

It is being used to estimate the sales of competitors in terms of units and dollar amount.

The only available data is that of our client so that is what is used for the estimates. In other words, if there is no data, then they use the data from the next closest record with data and fill everything in-between in with the same values.

I am trying to reproduce in SQL because I have developed SSIS package to pump data into SQL Server and then from there I am creating views to automate the reporting and push to Tableau or Power BI. This is a very manual process with lots of room for error.

Short fix, I BELIEVE that a specific pivot will help me recreate the logic of the model. This is what I am trying to pivot:

Parent ASIN Rank Bounds Groups
1 1 1
5 0 1
7 1 2
10 0 2
12 1 3
14 0 3

But, I need a table that looks like this:

Groups Lower_Bound Upper_Bound
1 1 5
2 5 7
3 7 10
4 10 12
5 12 14
6 14 18

I have tried using LAG and LEAD (e.g. select *, LAG([RB Units],1,0) over (order by [Parent ASIN Rank]) as test1, LEAD([RB Units],1,0) over (order by [Parent ASIN Rank]) as test2,), but the issue is that there are large groups of items that do not have data that are clustered together so only the first and last items without data in those groups can be assigned sales (or unit) data from the previous and/or next record.

For example, if row 10 does not have sales data, but rows 11 and 12 do, then LAG and LEAD will work. On the other hand, if rows 18,19,20,21,and 22 do not have data, then only rows 18 and 22 will work with the LAG and LEAD method. I am not sure if I can partition the LAG and LEAD in some way that will make it copy the data from the next closest record all the way through those large groups without data.

Here is an example of the model that I am trying to duplicate:

Estimate Model Screenshot

Previous Units Value Column Formula: =IF(E10="",G9,E10)

Next Units Value Column Formula: =IF(E10="",H11,E10)

Prev. Sales Rank Value Column Formula: =IF(E10="",I9,C10)

Next Sales Rank Value Column Formula: =IF(E10="",J11,C10)

Est. Total Units Column Formula: =IF(E10="",(G10-((G10-H10)/(J10-I10))*(C10-I10)),E10)

The pivot table is ordered by Total which is essentially the sales rank the Parent ASIN Rank is based off the Total, but both are based off an AVERAGE of the "Sales Rank: 30 days avg." which is supplied to us.

This needs to be dynamic because the Ranks (i.e. Total column and Parent ASIN Rank) will change when the data is refreshed. As of right now, this model breaks if there is not a record with data in the first row.

I am doing self-referenced CTEs, but I am starting to question whether this is possible or just extremely difficult to replicate a model with this kind of structure in SQL. It may not be the best way of estimating, but it is what my company has been using (I am relatively new and want to help automate processes for them).

I need to be able to say "This record pulls from X rows above (closest record above that is not null) and Y rows below (closest record below that is not null).

I have tried assigning IDs in many ways using ROW_NUMBER() OVER (<order by X,Y, and Z>).

Here is one of my scripts:

--create view vw_PT_Parent_ASIN_Units
--as
with a1 as (
    select
        a.[Adjusted Parent ASIN],
        a.[Fixed Brand],
        AVG([Sales Rank: 30 days avg#]) as Total
    from
        vw_Keepa_IFCN_Phase1 as a
    where
        [Include ASIN in Analysis?] = 1
    group by
        a.[Adjusted Parent ASIN],a.[Fixed Brand]
),
a2 as
(
    select 
        [Parent ASIN], 
        sum([Ordered Units]) as [Ordered Units],
        sum([Ordered Revenue]) as [Ordered Revenue]
    from 
        vw_RB_Sales
    group by 
        [Parent ASIN]
),
a3 as
(
    select
        a1.*,
        a2.[Parent ASIN] as [RB Parent ASIN],
        [RB Units] = 
            case
                when a2.[Parent ASIN] is null or a2.[Parent ASIN] = '' then null
                else a2.[Ordered Units]
            end,
        [RB Sales] = 
            case
                when a2.[Parent ASIN] is null or a2.[Parent ASIN] = '' then null
                else a2.[Ordered Revenue]
            end
    from
        a1
    left join
        a2
    on
        a1.[Adjusted Parent ASIN] = a2.[Parent ASIN]
),
Est_Model as
(
--I need to create a list of ASINs that are sorted by the "Total" 
    select 
        [Adjusted Parent ASIN], 
        ROW_NUMBER() over (order by Total, [Adjusted Parent ASIN]) as [Parent ASIN Rank]
    from 
        a3
    where 
        Total is not null
),
a4 as
(
    select 
        a3.*,
        b.[Parent ASIN Rank]
    from 
        a3
    left join 
        Est_Model as b
    on 
        a3.[Adjusted Parent ASIN] = b.[Adjusted Parent ASIN]
),
a5 as
(
    select distinct 
        a4a.[Adjusted Parent ASIN],
        a4a.[RB Parent ASIN],
        a4a.[Parent ASIN Rank], 
        Bounds=(ROW_NUMBER() over (order by [Parent ASIN Rank]))%2
    from
        a4 as a4a
    where
        a4a.total is not null
    and 
        a4a.[RB Units] is null
)
, test as (
select *,Groups=row_number() over (partition by Bounds order by [parent asin rank]) from a5 --order by [Parent ASIN Rank]
)
select * from test order by [Parent ASIN Rank]


select distinct t1.Groups, t1.[Parent ASIN Rank] as l, t2.[Parent ASIN Rank] as h
from test as t1
inner join test as t2
on t1.[Adjusted Parent ASIN] = t2.[Adjusted Parent ASIN]
where t1.Bounds=1
and t2.bounds =0


select Groups, Bounds, [1] as Lower_Bound, [0] as Upper_Bound
from test
pivot
    (
    sum(Bounds) for [Parent ASIN Rank]
    IN ([1],[0])
    )
as pvt
order by [Parent ASIN Rank]
--select distinct parent asin rank (low),parent asin rank (high),sales,units,total
--but i need to be able to separate the current parent ASIN ranks that you see in the results right now and I should do this by grabbing even and odd numbers
    
    
    /*
    select 
        *,
        ROW_NUMBER() over (order by [Parent ASIN Rank]),
        --DENSE_RANK() over
        (ROW_NUMBER() over (order by [Parent ASIN Rank]))/2,
        (ROW_NUMBER() over (order by [Parent ASIN Rank]))%2
    from
        a4
    where
        total is not null
    and 
        [RB Units] is null
    order by 
        [Parent ASIN Rank]
        */

--pivot so that the ranges are in separate columns and then use that as CTE and say if rank is in between a range then use this units number
--Low | High | Units | Sales
-- 1  |   5  | 2892  |  90186
-- 5  |   7  | 5076  |  121394


--  a4a.*, Test1= case when a4a.[Parent ASIN Rank] <> 1 then row_number() over (partition by a4a.[rb units] order by a4a.[Parent ASIN Rank]) else 0 end
--  , Test2=case when a4a.[rb units] is null then 1 else 0 end
--  , Test3=case when a4a.[rb units] is null then a4a.[Parent ASIN Rank]+1 else 0 end
--  , Test4=case when a4a.[rb units] is null then a4a.[Parent ASIN Rank]-1 else 0 end


--need to get the range. For example, the blank record falls in-between these two records
--I need to have one column with the low units and one column with the high units, but all in the same row....so there needs to be 2 joins...one to a low table and one to a high table
--This means that for low and high they need to match on the parent ASIN rank and an adjusted parent ASIN rank that matches the ASIN rank of the NULL record
--For example, B01M0ZV2CU has a rank of 5 which means that ranks 4 and 6 need to match that 5


--Need to fix duplicates which probably goes back to Keepa view definitions and joins



--Currently, my method works well for when there is a single row with NULL values, but it does not work for when there are multiple rows with NULL values

Below is another version of that script, but this one uses LAG and LEAD which works for just the one NULL record:

--create view vw_PT_Parent_ASIN_Units
--as
with a1 as (
    select
        a.[Adjusted Parent ASIN],
        a.[Fixed Brand],
        AVG([Sales Rank: 30 days avg#]) as Total
    from
        vw_Keepa_IFCN_Phase1 as a
    where
        [Include ASIN in Analysis?] = 1
    group by
        a.[Adjusted Parent ASIN],a.[Fixed Brand]
),
a2 as
(
    select 
        [Parent ASIN], 
        sum([Ordered Units]) as [Ordered Units],
        sum([Ordered Revenue]) as [Ordered Revenue]
    from 
        vw_RB_Sales
    group by 
        [Parent ASIN]
),
a3 as
(
    select
        a1.*,
        a2.[Parent ASIN] as [RB Parent ASIN],
        [RB Units] = 
            case
                when a2.[Parent ASIN] is null or a2.[Parent ASIN] = '' then null
                else a2.[Ordered Units]
            end,
        [RB Sales] = 
            case
                when a2.[Parent ASIN] is null or a2.[Parent ASIN] = '' then null
                else a2.[Ordered Revenue]
            end
    from
        a1
    left join
        a2
    on
        a1.[Adjusted Parent ASIN] = a2.[Parent ASIN]
),
Est_Model as
(
--I need to create a list of ASINs that are sorted by the "Total" 
    select 
        [Adjusted Parent ASIN], 
        ROW_NUMBER() over (order by Total, [Adjusted Parent ASIN]) as [Parent ASIN Rank]
    from 
        a3
    where 
        Total is not null
),
a4 as
(
    select 
        a3.*,
        b.[Parent ASIN Rank]
    from 
        a3
    left join 
        Est_Model as b
    on 
        a3.[Adjusted Parent ASIN] = b.[Adjusted Parent ASIN]
)
    select 
        *, 
        LAG([RB Units],1,0) over (order by [Parent ASIN Rank]) as test1,
        LEAD([RB Units],1,0) over (order by [Parent ASIN Rank]) as test2,
        --LAG([RB Units],1,0) over (partition by [RB Units] order by [Parent ASIN Rank]) as test11,
        --LEAD([RB Units],1,0) over (partition by [RB Units] order by [Parent ASIN Rank]) as test22,
        test3=case when [rb units] is null then 1 else 0 end
    from
        a4
    where
        total is not null


--  a4a.*, Test1= case when a4a.[Parent ASIN Rank] <> 1 then row_number() over (partition by a4a.[rb units] order by a4a.[Parent ASIN Rank]) else 0 end
--  , Test2=case when a4a.[rb units] is null then 1 else 0 end
--  , Test3=case when a4a.[rb units] is null then a4a.[Parent ASIN Rank]+1 else 0 end
--  , Test4=case when a4a.[rb units] is null then a4a.[Parent ASIN Rank]-1 else 0 end


--need to get the range. For example, the blank record falls in-between these two records
--I need to have one column with the low units and one column with the high units, but all in the same row....so there needs to be 2 joins...one to a low table and one to a high table
--This means that for low and high they need to match on the parent ASIN rank and an adjusted parent ASIN rank that matches the ASIN rank of the NULL record
--For example, B01M0ZV2CU has a rank of 5 which means that ranks 4 and 6 need to match that 5


--Need to fix duplicates which probably goes back to Keepa view definitions and joins



--Currently, my method works well for when there is a single row with NULL values, but it does not work for when there are multiple rows with NULL values

Finally, here is a screenshot of the output from my second script I posted. Hopefully, this will help clarify some things. Second Script Output

Any and all help is accepted! Thank you in advance!

Frosty_Fraz
  • 47
  • 1
  • 11
  • Not sure why you are re-joining if you are using `LEAD?LAG`, but you may want have a look at this https://stackoverflow.com/questions/44893970/last-value-with-ignore-nulls-in-sql-server/51550573 which explains how to get the previous non-null row. Your question as it stands isn't really answerable, as it's rather unclear and rambling as to what exact result you want and how `Parent ASIN Rank` comes imto it,, and the images are not copyable into text. A [mcve] involving `CREATE TABLE` and `INSERT` statements, with sample data and expected results would go a long way – Charlieface Jul 30 '21 at 08:36
  • Hello @Charlieface ! Thank you for your input. I had been looking at this for a long time so I thought I had provided enough information. Regardless, COALESCE turned out to be the answer because it returns the first non-null value in a list. It worked like a charm. So, thank you for providing me that link! Without it, I would have been lost for a while. I had a feeling that SQL had a precompiled function that did something like this. Again, thank you! If you want, you can post COALESCE as the answer or I can provide part of my script that ended up working. Up to you :) – Frosty_Fraz Jul 30 '21 at 15:09

1 Answers1

0

COALESCE turned out to be the answer because it returns the first non-null value in a list

Frosty_Fraz
  • 47
  • 1
  • 11