1

I have a sample table in snowflake database which has details about the product bought, the code of the shop, date on which the product was purchased, the amount of the product and the type of business as shown below: Sample Input Table

From this table, I need to remove the duplicates based on the columns "Shop_Code" and "Purchase_Date". I need to also keep in mind of the following conditions while removing the duplicates. Those are:

  1. If we get two rows while grouping on "Shop_Code" and "Purchase_Date" where the "Business_Type" is both "Large" then any one of the row can be deleted.
  2. If we get two rows while grouping on "Shop_Code" and "Purchase_Date" where the "Business_Type" is both "MidSmall" then any one of the row can be deleted.
  3. If we get one row with the "Business_Type" as "Large" and other one as "MidSmall" while grouping on "Shop_Code" and "Purchase_Date" then in this case the row with the "Business_Type" as "MidSmall" should be deleted retaining the row with "Large" as the business type.

The output I'm expecting is that rows 1, 3, 5 and 8 should be present in the output table. Request your help in terms of Snowflake SQL.

Sweekar
  • 25
  • 3
  • user RANK()/DENSE_RANK() and Derive Column For Deletion From it, On top of this column, apply Row_NUMBER() transformation which would help you in deleting the required rows. – akshindesnowflake Jun 23 '21 at 05:58
  • @akshindesnowflake Can you please elaborate with the table I have given? – Sweekar Jun 23 '21 at 06:50
  • can you provide your SQL Table definition and this sample data script. – akshindesnowflake Jun 23 '21 at 07:02
  • @akshindesnowflake The table definition Product_Bought - VARCHAR Shop_Code - VARCHAR Business_Type - VARCHAR Purchase_Date - DATE Amount - NUMBER and the query I'm trying is `With Dup as ( select "Shop_Code", "Purchase_Date", "Business_Type", row_number() over (partition by "Shop_Code","Purchase_Date" ORDER BY "Shop_Code","Purchase_Date" Desc,"Business_Type" Asc) as rn from purchase_table ) Select * from dup where rn > 1;` but I'm not able to apply the condition mentioned in my description. – Sweekar Jun 23 '21 at 10:36

1 Answers1

0

You can't delete from the table directly with Snowflake. The options you have involve running a select query to identify duplicates and inserting those back into the original table like this:

Setup sample table:

create or replace table sample_table as (
    select
        column1::varchar as product_bought,
        column2::varchar as shop_code,
        column3::varchar as business_type,
        column4::date    as purchase_date,
        column5::number  as amount
    from
    values ('Mobile', 'AVR', 'MidSmall', '2012-09-01', 13000),
           ('Mobile', 'AVR', 'MidSmall', '2012-09-01', 13000),
           ('Tablet', 'SVU', 'Large', '2012-02-23', 16000),
           ('Tablet', 'SVU', 'MidSmall', '2012-02-23', 16000),
           ('Laptop', 'MNR', 'Large', '2015-09-22', 78000),
           ('Laptop', 'MNR', 'Large', '2015-09-22', 78000),
           ('Headset', 'NZL', 'MidSmall', '2018-08-14', 1200),
           ('Headset', 'NZL', 'Large', '2018-08-14', 1200)
);

Create a new table that holds the non-duplicate records:

create or replace table sample_table_non_duplicates as
select
    st.*
from sample_table st
    qualify row_number() over (partition by shop_code, purchase_date order by IFF(upper(business_type) = 'MIDSMALL', 2, 1) ) = 1
;

Swap the original table with the one that contains no duplicates:

alter table sample_table swap with sample_table_non_duplicates;
-- drop table sample_table_non_duplicates --(Drops original table!)

Results:

select * from sample_table;
+--------------+---------+-------------+-------------+------+
|PRODUCT_BOUGHT|SHOP_CODE|BUSINESS_TYPE|PURCHASE_DATE|AMOUNT|
+--------------+---------+-------------+-------------+------+
|Mobile        |AVR      |MidSmall     |2012-09-01   |13000 |
|Headset       |NZL      |Large        |2018-08-14   |1200  |
|Laptop        |MNR      |Large        |2015-09-22   |78000 |
|Tablet        |SVU      |Large        |2012-02-23   |16000 |
+--------------+---------+-------------+-------------+------+

You could run an insert overwrite query on your original table with the query that removes duplicates but that could be a bit risky and you may lose your original data. Here are some other options

Simon D
  • 5,730
  • 2
  • 17
  • 31