-1

We had a bug in our code that cached the wrong values to the last_order_id (expected_previous_order) column. The query I wrote properly finds the correct last order id but is too slow for our data set.

I want to select this data into another table but I cannot because the query will take too long

I have setup a simple example here with a smaller data set. Original table has about 170k rows.

SQL Fiddle of my Example

In the example:

  • original_artwork_id is how these rows are grouped.
  • order_id is the current rows order id
  • actual_previous_order is the corrected last order id
  • expected_previous_order is the currently stored last order id. This is the wrong value as it does not actually reference the last order id

Showing that the acutal_previous_order should correspond with the previous order_id

EXPLAIN Results

enter image description here

EDIT

Every time a reorder is placed a new entry is placed into the order_artwork table with a reference to the original_artwork_id and last_order_id.

The reference in the current data set to the last_order_id is wrong.

I need to update all records to properly indicate the last order id.

I am doing this by trying to find each artwork and joining it with the previous entry of the same original_artwork_id. Then I can pull the order_id from the last entry to update the current entries last_order_id

Join the current row with the previous row created before the current row with the same original_artwork_id or the current row original_artwork_id = the previous rows id

styks
  • 3,193
  • 1
  • 23
  • 36
  • I faild to see the expected result and what is the question. – Juan Carlos Oropeza Nov 29 '17 at 14:49
  • Performance questions should include `EXPLAIN ANALYZE` and some information about table size, index, current time performance, desire time, etc. `Slow` is a relative term and we need a real value to compare. [**MySQL**](http://dba.stackexchange.com/questions/15371/how-do-i-get-the-execution-plan-for-a-view) [**How do I obtain a Query Execution Plan?**](http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-plan) – Juan Carlos Oropeza Nov 29 '17 at 14:50
  • I dont understand you say query works ok, but then you say expected_previous_order has the wrong value. If we dont know the expected output or the logic how can we optimize the query. – Juan Carlos Oropeza Nov 29 '17 at 14:54
  • Thank you for your comments. actual_previous_order contains the corrected value. The query works properly but will never complete with a larger data set. I am trying to figure out a way to optimize the subquery. expected_previous_order is only there to illustrate that the cached value is wrong on the table. I think the fiddle is pretty clear? – styks Nov 29 '17 at 15:01
  • Maybe is clear for you. I don't know the logic and you don't take time to explain it. So do you expect we do reverse engineering to understand what you need? – Juan Carlos Oropeza Nov 29 '17 at 15:03
  • I have updated the question with an intended goal as well as an explain statement from the original table. – styks Nov 29 '17 at 15:04
  • `I want to select this data into another table` is too broad. Explain the logic you want your query do. – Juan Carlos Oropeza Nov 29 '17 at 15:07
  • I have added a more detailed explanation. Thank you. – styks Nov 29 '17 at 15:19
  • So for the first row in your sql fiddle `{62555 | 67230 | 67012 }` is 67012 because is the closest on time to 67230 ? – Juan Carlos Oropeza Nov 29 '17 at 15:25
  • Very close. So the second row id `167935` is closest in time to the first row `169039`. It isn't the closest order but the closest artwork with the same `original_artwork_id`. Then I reference that closest artwork's order id to get the `actual_previous_order` – styks Nov 29 '17 at 15:45

2 Answers2

1

Not sure if this will be faster than your current query. But anyway.

SQL DEMO

First you need add a new field

`pos` int DEFAULT 0,

And update your base case so can do the JOIN.

update `order_artwork` o
SET `original_artwork_id` = `id`
WHERE  `original_artwork_id` IS NULL;

You could use COALESCE(original_artwork_id, id) but cant use index on that case.

Then assign a row_number to each order based in original_artwork_id and date

update `order_artwork` o
left join (
  SELECT o.id,
         @rn := if(@order_id = `original_artwork_id`,
                   @rn + 1,
                   if(@order_id := `original_artwork_id`, 1, 1)
                   ) as rn
  FROM `order_artwork` o
  CROSS JOIN (SELECT @id := 0, @order_id := 0, @rn := 0) as var
  ORDER BY `original_artwork_id`,
           `created`
) b on
    o.id = b.id
set
    o.pos = b.rn;

Finally update the last order.

UPDATE `order_artwork` o
JOIN (
      SELECT o1.original_artwork_id,
             o2.order_id,
             o1.order_id as last_order_id
      FROM `order_artwork` o1
      LEFT JOIN `order_artwork` o2
        ON o1.pos = o2.pos - 1
       AND o1.original_artwork_id  = o2.`original_artwork_id`
      WHERE o2.pos IS NOT NULL 
      ) as b
 ON o.original_artwork_id = b.original_artwork_id
AND o.order_id = b.order_id
SET o.last_order_id = b.last_order_id;
Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
  • Thank you for the very detailed response. This ended up working but I found that the created field was wrong for a lot of the data. So I created an alternate query that just finds the last order id that had the same original_artwork_id. Thanks again. – styks Nov 30 '17 at 14:48
  • Glad to know. So how much fast was? I wasn't sure if will improve the performance. – Juan Carlos Oropeza Nov 30 '17 at 14:49
  • The performance on the update took around 10 minutes to complete. Everything else was under a minute. – styks Nov 30 '17 at 14:53
0

I found that the created time column was not reliable. So I decided to just find the last highest order id with the same original_artwork_id.

Create a table that has the corrected values

CREATE TABLE order_artwork_two AS 
   select 
      d1.id, 
      d1.order_id, 
      max(d2.order_id) last_order_id, 
      d1.original_artwork_id 
   from order_artwork d1
   left join order_artwork d2 
       ON d1.original_artwork_id = d2.original_artwork_id 
           and d1.order_id > d2.order_id 
   group by d1.original_artwork_id, d1.order_id;

Add an index to the new table. Otherwise the update would be way too slow

alter table order_artwork_two add primary KEY(id);

Update our original table.

update order_artwork d1 
    left join order_artwork_two d2 on d2.id = d1.id
    set d1.last_order_id = d2.last_order_id;
styks
  • 3,193
  • 1
  • 23
  • 36