1

I have a single SQL table that contains multiple entries for each customerID (some customerID's only have one entry which I want to keep). I need to remove all but the most recent entry per customerID, using the invoiceDate field as my marker.

So I need to go from this:

+------------+-------------+-----------+
| customerID | invoiceDate | invoiceID |
+------------+-------------+-----------+
|          1 |  1393995600 |       xx  |
|          1 |  1373688000 |       xx  |
|          1 |  1365220800 |       xx  |
|          2 |  1265220800 |       xx  |
|          2 |  1173688000 |       xx  |
|          3 |  1325330800 |       xx  |
+------------+-------------+-----------+

To this:

+------------+-------------+-----------+
| customerID | invoiceDate | invoiceID |
+------------+-------------+-----------+
|          1 |  1393995600 |       xx  |
|          2 |  1265220800 |       xx  |
|          3 |  1325330800 |       xx  |
+------------+-------------+-----------+

Any guidance would be greatly appreciated!

pyb
  • 4,813
  • 2
  • 27
  • 45
rocky
  • 464
  • 6
  • 22
  • 1
    I suggest you research the SQL keywords MAX and GROUP BY – Tony Sep 01 '15 at 22:50
  • Thanks Tony, I'm familiar with them but I can't seem to come up with a query to accomplish what I need to. I've tried a few nested queries with mixed results. My latest attempt deleted all but 1 row out of about 30,000 rows. – rocky Sep 01 '15 at 22:52
  • It's a flaw of SO that this answer has been poorly answered so often that's it actually hard to separate the wheat from the chaff. That said, there are lots of good answers too. Unfortunately none of them are down there vvv – Strawberry Sep 01 '15 at 23:24

4 Answers4

2
  1. Write a query to select all the rows you want to delete:
SELECT * FROM t
WHERE invoiceDate NOT IN (
    SELECT MAX(invoiceDate)
    -- "FROM t AS t2" isn't supported by MySQL, see http://stackoverflow.com/a/14302701/227576
    FROM (SELECT * FROM t) AS t2
    WHERE t2.customerId = t.customerId
    GROUP BY t2.customerId
)

This may take a long time on a big database.

  1. If you're satisfied, change the query to a DELETE statement:
DELETE FROM t
WHERE invoiceDate NOT IN (
    SELECT MAX(invoiceDate)
    -- "FROM t AS t2" isn't supported by MySQL, see http://stackoverflow.com/a/14302701/227576
    FROM (SELECT * FROM t) AS t2
    WHERE t2.customerId = t.customerId
    GROUP BY t2.customerId
)

See http://sqlfiddle.com/#!9/6e031/1

If you have multiple rows whose date is the most recent for the same customer, you would have to look for duplicates and decide which one you want to keep yourself. For instance, look at customerId 2 on the SQL fiddle link above.

pyb
  • 4,813
  • 2
  • 27
  • 45
  • 1
    Thanks pyb, your answer worked for me with one small fix. I got a 1093 error -- You can't specify target table ... for update in FROM clause. The solution was to replace 'FROM t as t2' in line 5 of your code with 'FROM (SELECT * FROM t) AS t2'. More info here - http://stackoverflow.com/a/14302701/227576 – rocky Sep 02 '15 at 00:59
  • @rocky I didn't know that! Thanks for your comment I'm updating my answer. – pyb Sep 02 '15 at 02:04
1

Try out this one

  with todelete as
(
            select 
            CustomerId, InvoiceId, InvoiceDate, Row_Number() over (partition by CustomerId  order by InvoiceDate desc) as Count
             from DeleteDuplicate
)


delete from todelete
where count > 1
Learner
  • 1,490
  • 2
  • 22
  • 35
0

Let us asume that the table name is transaction_table.

create table test1 AS
select * from (
  select * from transaction_table order by customerID, invoiceDate desc) temp
group by customerID

You will have the output data in test1 table.

seahawk
  • 1,872
  • 12
  • 18
0
delete from ex_4 where
rowid in
(select rowid
from ex_4 a 
where to_date(invoicedate,'DDMMYYYY') = (select max(to_date(invoicedate,'DDMMYYYY')) from ex_4 b where a.customerid != b.customerid))

This is how it will be done in oracle.This query will delete all but most recently added row.Looking at your table structure i am assuming that the invoicedate column is varchar2 type so converting it to date used to_date function here

Akshay Joshi
  • 467
  • 1
  • 9
  • 22