0

I want to update the following rows after the sum of the previous rows reach a defined threshold. I'm using MySQL, and trying to think of a way to solve this using SQL only.

Here's an example. Having the threshold 100. Iterating through the rows, when the sum of the previous rows amount >= 100, set the following rows to checked.

Before the operation:

| id | amount | checked |
| 1  | 50     | false   |
| 2  | 50     | false   |
| 3  | 20     | false   |
| 4  | 30     | false   |

After the operation:

| id | amount | checked |
| 1  | 50     | false   |
| 2  | 50     | false   | <- threshold reached (50 + 50 >= 100)
| 3  | 20     | true*   |
| 4  | 30     | true*   |

Is it possible to do it with just a SQL query? Do I need a stored procedure? How could I implement it using either solution?

Paulo Faria
  • 101
  • 2
  • 7

2 Answers2

1

You can do this by calculating the cumulative amount and using update, and join:

update table t join
       (select t.*, (select sum(amount) from table t2 where t2.id <= t.id) as cum
        from table t
       ) tcum
       on tcum.id = t.id and tcum.cum >= 100
    set checked = true;

EDIT:

For faster performance, you can use variables. The following should be a correct way to do this:

update table t join
       (select t.*, (@cum := @cum + amount) as cum
        from table t cross join
             (select @cum := 0) vars
        order by t.id
       ) tcum
       on tcum.id = t.id and tcum.cum >= 100
    set checked = true;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    this is using a correlated subquery which is usually frowned on... it can make the process take hours depending on the size of the database... since its a one time thing its ok but I usually try to steer away from them. – John Ruddell Jun 12 '14 at 15:58
  • Yeah! It did work on the test table, but on the real table it's taking forever. – Paulo Faria Jun 12 '14 at 16:20
  • 1
    @JohnRuddell . . . I once felt that way about correlated subqueries. Now I better understand where they are useful and occasionally where they are needed. – Gordon Linoff Jun 12 '14 at 16:42
  • Yea I do use them sometimes... but I only use them when I know approx how much data I have. they are very useful in certain situations.. and when updating a table its a `one time thing` so its not like its the end of the world.. but I figured it would be good to point it out since the OP probably wouldn't have known that. TBH I look at your answers all the time because they are really concise and useful :) I may or may not be your SO stalker :P – John Ruddell Jun 12 '14 at 16:47
  • @GordonLinoff I haven't really understood the benefit of a cross join as opposed to another join. would you be able to explain that? – John Ruddell Jun 12 '14 at 16:50
  • @JohnRuddell - well there's no predicate for the (cross) join, is there? The optional 'cross' keyword is just for clarity. – Strawberry Jun 12 '14 at 16:53
  • @Strawberry Well if I understand correctly the cross join is not the same as a regular join- http://stackoverflow.com/questions/17759687/cross-join-vs-inner-join-in-sql-server-2008 -------------- what I was asking was whats the difference between doing a cross join or regular join when creating a user-defined variable. was that what you were asking? – John Ruddell Jun 12 '14 at 17:43
  • Its still taking too long :(. The table is reaaallly big. Maybe a stored procedure would be faster? – Paulo Faria Jun 13 '14 at 14:04
  • @PauloFaria . . . How big is really big? You are doing a cumulative sum on a column with no partitions on the sum. That would't usually occur on a *big* table (one with millions of rows). – Gordon Linoff Jun 13 '14 at 16:20
0

Something like this? haven't tested so it may need a tweak or two but that should do what you want.

UPDATE table t, 
    (SELECT
      @a := @a + amount AS cumulative_sum
    FROM table
    JOIN (SELECT @a := 0) as whatever 
) temp
  SET t.checked = true WHERE temp.cumulative_sum >= 100 ;
John Ruddell
  • 25,283
  • 6
  • 57
  • 86