0

hi I loaded a csv file into my database table, where I appended the rows from the csv file. But I accidentally appended the rows twice so now the data is showing up as duplicates in my table, which looks like this:

data_run_date fiscal_quarter invoice_id invoice_type
2022-07-01 Q4 12345 Paid
2022-07-01 Q4 12345 Paid
2022-01-01 Q1 12345 Not Paid
2022-07-01 Q4 12678 Paid
2022-01-01 Q1 12678 Not Paid

I need the table to look like this

data_run_date fiscal_quarter invoice_id invoice_type
2022-07-01 Q4 12345 Paid
2022-01-01 Q1 12345 Not Paid
2022-07-01 Q4 12678 Paid
2022-01-01 Q1 12678 Not Paid

the second record of paid for invoice id 12345 was a result of my accidental append to the table twice. I a trying to delete this record from the database table.

Now the simple solution would be to do this:

DELETE FROM db.invoice
WHERE data_run_date = TIMESTAMP '2022-07-01 00:00:00.000'

and then re-append the csv

But I wanted to know if there are other better/easier/more efficient ways to do it. So I tried this:

DELETE FROM db.invoice  
WHERE EXISTS (
SELECT * FROM 
(
SELECT t.*,
       ROW_NUMBER () OVER (PARTITION BY INVOICE_ID
                                ORDER BY DATA_RUN_DATE) RN 
FROM db.invoice t
)
WHERE RN = 2
AND DATA_RUN_DATE = TIMESTAMP '2022-07-01 00:00:00.000'
)

But this deleted all the records from the table (luckily I am working on a temp table so I was able to add all the original data to the temp table from the main table using the INSERT statement. But if someone could help me with a better solution. I would much appreciate it.

thank you

EDIT: Note, I also tried the answer to this question but it only kept the latest record, which is not what I want: Removing duplicate rows from table in Oracle

On_demand
  • 47
  • 6
  • Does this answer your question? [Removing duplicate rows from table in Oracle](https://stackoverflow.com/questions/529098/removing-duplicate-rows-from-table-in-oracle) – astentx Jul 22 '22 at 05:11
  • thanks for the suggestion! I have other columns in my table, which is why I am not trying to use group by. Also, when I try the suggested answer, it only keeps the latest record, which is not what I want – On_demand Jul 22 '22 at 13:28
  • "latest" just means that you have to resolve duplicates and choose one of multiple rows. You may also group by all the columns in the table, it really doesn't matter fo `delete` what are your columns – astentx Jul 22 '22 at 14:27

2 Answers2

0

You can use:

DELETE FROM db.invoice
WHERE ROWID IN (
  SELECT ROWID
  FROM   (
    SELECT ROW_NUMBER() OVER (
             PARTITION BY data_run_date, fiscal_quarter, invoice_id, invoice_type
             ORDER BY NULL
           ) AS rn
     FROM  db.invoice
  )
  WHERE rn > 1
)

Which deletes the duplicate row.

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
0

Here's one option.

Sample data:

SQL> select * From invoice order by invoice_id, fiscal_quarter desc;

DATA_RUN_D FI INVOICE_ID INVOICE_
---------- -- ---------- --------
2022-07-01 Q4      12345 Paid
2022-07-01 Q4      12345 Paid
2022-01-01 Q1      12345 Not Paid
2022-07-01 Q4      12678 Paid
2022-01-01 Q1      12678 Not Paid

Delete duplicates:

SQL> delete from invoice a
  2  where a.rowid > (select min(b.rowid)
  3                   from invoice b
  4                   where b.invoice_id = a.invoice_id
  5                     and b.fiscal_quarter = a.fiscal_quarter
  6                     and b.invoice_type = a.invoice_type
  7                     and b.data_run_date = a.data_run_date
  8                  );

1 row deleted.

Result:

SQL> select * From invoice order by invoice_id, fiscal_quarter desc;

DATA_RUN_D FI INVOICE_ID INVOICE_
---------- -- ---------- --------
2022-07-01 Q4      12345 Paid
2022-01-01 Q1      12345 Not Paid
2022-07-01 Q4      12678 Paid
2022-01-01 Q1      12678 Not Paid

SQL>
Littlefoot
  • 131,892
  • 15
  • 35
  • 57