hi I loaded a csv file into my database table, where I appended the rows from the csv file. But I accidentally appended the rows twice so now the data is showing up as duplicates in my table, which looks like this:
data_run_date | fiscal_quarter | invoice_id | invoice_type |
---|---|---|---|
2022-07-01 | Q4 | 12345 | Paid |
2022-07-01 | Q4 | 12345 | Paid |
2022-01-01 | Q1 | 12345 | Not Paid |
2022-07-01 | Q4 | 12678 | Paid |
2022-01-01 | Q1 | 12678 | Not Paid |
I need the table to look like this
data_run_date | fiscal_quarter | invoice_id | invoice_type |
---|---|---|---|
2022-07-01 | Q4 | 12345 | Paid |
2022-01-01 | Q1 | 12345 | Not Paid |
2022-07-01 | Q4 | 12678 | Paid |
2022-01-01 | Q1 | 12678 | Not Paid |
the second record of paid for invoice id 12345 was a result of my accidental append to the table twice. I a trying to delete this record from the database table.
Now the simple solution would be to do this:
DELETE FROM db.invoice
WHERE data_run_date = TIMESTAMP '2022-07-01 00:00:00.000'
and then re-append the csv
But I wanted to know if there are other better/easier/more efficient ways to do it. So I tried this:
DELETE FROM db.invoice
WHERE EXISTS (
SELECT * FROM
(
SELECT t.*,
ROW_NUMBER () OVER (PARTITION BY INVOICE_ID
ORDER BY DATA_RUN_DATE) RN
FROM db.invoice t
)
WHERE RN = 2
AND DATA_RUN_DATE = TIMESTAMP '2022-07-01 00:00:00.000'
)
But this deleted all the records from the table (luckily I am working on a temp table so I was able to add all the original data to the temp table from the main table using the INSERT statement. But if someone could help me with a better solution. I would much appreciate it.
thank you
EDIT: Note, I also tried the answer to this question but it only kept the latest record, which is not what I want: Removing duplicate rows from table in Oracle