0

I have the following schema:

id | order_ref | description | price

Currently I have the following duplicate issue:

1 | 34567 | This is the description | 19.99
2 | 34567 | This is the description | 13.99

This was due to the data I was importing having the description for each item duplicated. Is there a way I can keep the first row, and then UPDATE the description on subsequent (up to approx 20 rows) to be 'AS ABOVE'?

1 | 34567 | This is the description | 19.99
2 | 34567 | - AS ABOVE - | 13.99

Thanks

-------UPDATED

UPDATE documents_orders_breakdown
SET `desc` = '- AS ABOVE -'
WHERE NOT id IN (SELECT id
             FROM documents_orders_breakdown AS D
             WHERE D.`desc` <> `desc`
             ORDER BY D.id
             LIMIT 1)

But this returns [Err] 1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

--------UPDATED

UPDATE documents_orders_breakdown
SET `desc` = '- AS ABOVE -'
WHERE NOT id IN (SELECT MIN(id)
                 FROM documents_orders_breakdown AS t
                 WHERE t.`desc` = `desc`)

This now returns [Err] 1093 - You can't specify target table 'documents_orders_breakdown' for update in FROM clause

Juan Mellado
  • 14,973
  • 5
  • 47
  • 54
Jeepstone
  • 2,591
  • 5
  • 22
  • 38
  • Thanks for the reply Andriy. The schema is correct and as you have stated, but the issue of multiple rows with identical descriptions comes from the way I have had to import the data. The data was originally in a Filemaker (4!) format. Each 'order_ref' had description | price | qty, but instead of each order_ref having 1 row it was in the format order_ref | description | value 1 | price 1 | value 2 | price 2. In addition, each corresponding description was separated only by a 'random' line break so it was impossible to split out the description against each 'row'. – Jeepstone Jun 28 '11 at 08:01
  • I duped the descriptions so that at least we had historical data still available. Going forward, each description matches it's own value | price as the schema should be. The ordering should work here as they have been ordered correctly before importing so the first description (by id) will be the one to keep. – Jeepstone Jun 28 '11 at 08:06

3 Answers3

1

If this is a one-time thing, performance is not a big issue. You can run an UPDATE on all the records that are not returned by a SELECT with a LIMIT of 1.

UPDATE the_table
SET description = '- AS ABOVE -'
WHERE NOT id IN (SELECT id
                 FROM the_table t
                 WHERE t.description = the_table.description
                 ORDER BY t.id
                 LIMIT 1)

This query assumes you want to keep the description of the record whose id comes first (hence the ORDER BY).


Since you can't use LIMIT in subqueries, you can work around that by using the aggregate function MIN:

UPDATE the_table
SET description = '- AS ABOVE -'
WHERE NOT id IN (SELECT MIN(id)
                 FROM the_table t
                 WHERE t.description = the_table.description)

(Let's hope you can mix MIN and subqueries ;)


Apparently you can't SELECT from the table you're UPDATEing in MySQL. A workaround is to use an implicit temporary table. This is bad for performance, but, again, given this is a one-time thing, that's not a big concern.

UPDATE the_table
SET description = '- AS ABOVE -'
WHERE NOT id IN (SELECT m FROM (SELECT MIN(id) AS m
                 FROM the_table t
                 WHERE t.description = the_table.description) AS temp)
Community
  • 1
  • 1
R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
  • Martinho, that's exactly what I want to do as the first 'occurrence' is the one I want to keep the description for. Any ideas how I could rewrite the query to get around the LIMIT problem (updated question)? – Jeepstone Jun 27 '11 at 09:20
  • Ouch, now, that's a problem! Not sure, but maybe you can work around it with some clever use of MIN. I'll see what I can do and update my answer. – R. Martinho Fernandes Jun 27 '11 at 09:26
  • I updated my answer (Also, note that I changed `t.description <> description` to `t.description = description`, which was a typo that could be disastrous! Always double check the query before running an UPDATE!) – R. Martinho Fernandes Jun 27 '11 at 09:31
  • @Jeepstone: dammit. New edit. Let's see if it still fights back now :) – R. Martinho Fernandes Jun 27 '11 at 15:02
  • ;o) Just affected the first row and updated all the others to -AS ABOVE- ooops! – Jeepstone Jun 27 '11 at 15:23
  • @Jeepstone: you did make a backup before running code a stranger gave to you, right ;)? I forgot to add the table name in the WHERE, otherwise the condition is always true! – R. Martinho Fernandes Jun 27 '11 at 15:49
  • lol, yes I made a backup. Now I get [Err] 1054 - Unknown column 'documents_orders_breakdown.desc' in 'where clause' – Jeepstone Jun 27 '11 at 16:07
  • Gosh, how I hate MySQL. I tested that in MSSQL (and it works), but I don't know how to refer to the description outside the subquery in MySQL :( – R. Martinho Fernandes Jun 27 '11 at 16:09
  • +1 for suggesting the implicit temporary tables which, by the way, also allow the use of LIMITs in IN subqueries *in certain cases*. – KatDevsGames Jul 22 '13 at 13:43
0

Relational datebases do not have a notion of subsequent. Records in a table are not in any particular order. If you do not specify an order in a SELECT query, you have to assume that the records are retrieved in an order that you do not expect.

Oswald
  • 31,254
  • 3
  • 43
  • 68
0

The comment Oswald made about ordering (or lack thereof) of the rows is very important. You have no garuntee, period, that unsorted rows selected out of this table will be in the order you expect. This means that unless you specify the existing in table order every single time, things could be tagged 'AS ABOVE' even when this does not reflect reality. In addition, none of the provided solutions so far will deal with any out-of-sequence records properly.
Overall, this sounds more like a database design issue (specifically, a normalization problem), than a query issue.
Ideally, the descriptions would be extracted to some master datatable (along with the necessary ids). Then, the choice about the description to use is left to when the 'SELECT' runs. This has the added benefit of making the 'AS ABOVE' safe for changes in ordering.

So, assuming that each instance of the order_ref column should have a different description (barring the 'AS ABOVE' bit), the tables can be refactored as followed:

id | order_ref | price 
======================= 
 1 | 34567     | 19.99  
 2 | 34567     | 13.99  

and

order_ref_fk | description  
==========================================
34567        | "This is the description"

At this point, you join to the description table normally. Displaying a different description is usually a display issue regardless, to be handled by whatever program you have outputting the rows to display (not directly in the database).

If you insist on doing this in-db, you could write the SELECT in this vein:

SELECT Orders.id, Orders.order_ref, Orders.price,
       COALESCE(Dsc.description, 'AS ABOVE')
FROM Orders
LEFT JOIN (Description
           JOIN (SELECT order_ref, MIN(id) AS id
                 FROM Orders
                 GROUP BY order_ref) Ord
             ON Ord.order_ref = Description.order_ref_fk) Dsc
        ON Dsc.order_ref_fk = Orders.order_ref
           AND Dsc.id = Orders.id
ORDER BY Orders.order_ref, Orders.id
Clockwork-Muse
  • 12,806
  • 6
  • 31
  • 45