3

I have a SQL table, one row is the revenue in the specific day, and I want to add a new column in the table, the value is the incremental (could be positive or negative) revenue between a specific day and the previous day, and wondering how to implement by SQL?

Here is an example,

original table,

...
Day1 100
Day2 200
Day3 150
...

new table (add incremental column at the end, and for first column, could assign zero),

Day1 100 0
Day2 200 100
Day3 150 -50

I am using MySQL/MySQL Workbench.

thanks in advance, Lin

Strawberry
  • 33,750
  • 13
  • 40
  • 57
Lin Ma
  • 9,739
  • 32
  • 105
  • 175
  • You might want to simulate a `LEAD` or `LAG` function: http://stackoverflow.com/questions/29556063/mysql-lag-lead-function – A Hocevar Sep 02 '15 at 07:06
  • 1
    Do your table have a primary key or unique key column. – Praveen Sep 02 '15 at 07:10
  • @Praveen, yes, have an auto-incremental integer transaction ID column, to make it simple, I ignore it in my question. Any advice is appreciated. :) – Lin Ma Sep 02 '15 at 07:11

3 Answers3

4
SELECT a.day, a.revenue , a.revenue-COALESCE(b.revenue,0) as previous_day_rev 
FROM DailyRevenue a 
LEFT JOIN DailyRevenue b on a.day=b.day-1

the query assume that each day has one record in the table. If there could be more than 1 row for each day you need to create a view that sums up all days grouping by day.

Nir-Z
  • 819
  • 1
  • 13
  • 31
  • thanks a lot, in your sample, table a and table b are two tables? Shall we use one table name as I only have one table? :) – Lin Ma Sep 02 '15 at 07:22
  • 2
    no, it is the same table, a and b are only two different aliases to allow a self-join – A Hocevar Sep 02 '15 at 07:25
  • 1
    Table a and Table b are the same table. The table joins itself :). a and b are just alias to the same table. – Nir-Z Sep 02 '15 at 07:26
  • @A Hocevar, I am lost and if in my case, my table name is DailyRevenue, how to write a and b here? – Lin Ma Sep 02 '15 at 07:27
  • @Nir-Z, thanks. in my case, my table name is DailyRevenue, how to write a and b here in your sample? – Lin Ma Sep 02 '15 at 07:28
  • 1
    Replace 'table' with DailyRevenue (your table name) – Nir-Z Sep 02 '15 at 07:28
  • @Nir-Z, smart and smart! :) – Lin Ma Sep 02 '15 at 08:38
  • 1
    As an aside, this approach assumes day is numeric, and sequential with no gaps. It will also fail to print either the first or last day (not sure which and I have a headache so not the best time to be trying to look at edge cases!) – Jon Marnock Sep 03 '15 at 04:41
  • 1
    If the day is datetime, just use dateadd function (assuming that the time of all days in the same). About the edge case there is only one: the last row which wont have a match in table b. In this case left join will return null in b.revenue so use COALESCE(b.revenue,0). – Nir-Z Sep 03 '15 at 06:06
  • @Nir-Z, is your current code dependent on "day is numeric, and sequential with no gaps"? I do not see such dependencies, but if there does, please feel free to correct me. :) – Lin Ma Sep 04 '15 at 21:15
  • 1
    The query assumes that day is numeric for demonstration purposes. since no data about the day field type is provided. If the field is datetime you could use dateadd function instead of day-1. One other thing is that the query calculates the difference between two sequential days. So if you have for example day1, day3... so in day3 has null so you should decide whether to a.revenue- COALESCE(b.revenue,0) which will return the day3-0, or a.revenue-COLAECE(b.revenue,a.revenue) which will return 0. But if you want to get the difference between day3 and day1 this case is not considered. – Nir-Z Sep 05 '15 at 07:41
  • @Nir-Z, very thoughtful, if I want to calculate the differences between day3 and day1, I think your code does not work? How to implement to calculate the differences between two most nearby days (not necessarily to be consecutive two days)? – Lin Ma Sep 07 '15 at 00:18
  • 1
    If the id of rows day1 and day3 is consecutive you can use id-1 instead of day-1 in the query. A better and safe solution is to use a stored procedure that loops over all the rows ordered by date ascending and calculating the difference between the rows in every iteration. – Nir-Z Sep 07 '15 at 05:15
3

If you're okay with re-ordering the columns slightly, something like this is pretty simple to understand:

SET @prev := 0;
SELECT day, revenue - @prev AS diff, @prev := revenue AS revenue
FROM revenue ORDER BY day ASC;

The trick is that we calculate the difference to the previous first, then set the previous to the current and display it as the current in one step.

Note, this depends on the order being correct since the calculations are done during the returning of the rows, so you need to make sure you have an ORDER BY clause that returns the days in the correct order.

Jon Marnock
  • 3,155
  • 1
  • 20
  • 15
  • smart answer. Wondering is it a must we need to use := for assignment, other than just using = itself? – Lin Ma Sep 02 '15 at 08:42
  • 1
    = in MySQL does a comparison, so we need := to do assignment (otherwise `@prev = revenue` would return either a 0 or a 1, depending on if it did or didn't match). – Jon Marnock Sep 02 '15 at 08:46
  • 1
    I should probably also add a short warning about setting and reading variables in the same statement as per http://dev.mysql.com/doc/refman/5.5/en/user-variables.html - technically the order is undefined. If you really want to do this safely, the best way is in your code that reads the returned query results. – Jon Marnock Sep 03 '15 at 04:47
  • could you provide a bit more information about what do you mean "reads the returned query results"? Thanks. – Lin Ma Sep 03 '15 at 06:21
  • 1
    Sorry, I just meant that if you're looking for a much more stable approach to doing this, you should just have your query return only the day and the amount, and then in your PHP or C# or whatever programming language you're using, have it calculate the difference to the previous day's amount as it iterates over the resultset it gets returned from the mysql libraries. – Jon Marnock Sep 03 '15 at 08:02
  • for your advice of not using assignment of a SQL variable in the same statement, I also read code from Praveen, I think his code has potential issues using assignment in the same statement? How do you think? Thanks. – Lin Ma Sep 04 '15 at 21:23
  • Hi Jon, for your advice of not using assignment in the same statement, you are also using assignment as "@prev := revenue" in the same SQL select statement, will there be any issues? – Lin Ma Sep 04 '15 at 21:29
  • 1
    Yes to both :) I guess what I'm saying is, the above has worked for at least a decade, and I see no reason why it would suddenly fail to work, but you just have to be careful to understand exactly how and why the above works, and what the limitations are on how you construct the query. For example, adding having statements that refer to either of the two columns with the variables in them may break the query. – Jon Marnock Sep 07 '15 at 00:02
  • if using "@prev := revenue" in the same select query statement is not correct (potential of assignment in the same statement), then what is the right way to write code of the same logics? Thanks. – Lin Ma Sep 07 '15 at 00:16
  • 1
    The best place to do this is in your application (not the SQL). If you must do it in the SQL for some reason, the only safe way is to do a well defined self join. This relies on proper ordering of the join keys so there's always a "prev" match, and you'll need to think about what you do for the first day (where there is no "prev" day in the join condition) - probably a left join with conditional logic in the difference column to handle the null. This of course makes the requirement for no-gaps on the day sequences even more important. – Jon Marnock Sep 07 '15 at 01:29
  • 1
    Final thoughts: self joins can be slow, be careful with your indexes if they aren't primary keys etc, and if I were in a situation where I couldn't implement the above in code for some reason, I'd almost certainly be in a "hacky" situation (where I may not even be able to guarantee sequential days or nice join indexes), so I'd probably go for the above because it's short and doesn't require anything else, and it's own "hackyness" is probably outweighed by the "hackyness" of whatever I was trying to do anyway :) Hope that helps! – Jon Marnock Sep 07 '15 at 01:31
  • thanks for all the advice, and will take them into account. :) – Lin Ma Sep 07 '15 at 02:30
  • 1
    No worries! I actually just had a situation where I ended up using the above type query, so by pure chance I ended up helping myself :) Always a pleasure. – Jon Marnock Sep 07 '15 at 05:06
  • 1
    what is your favorite method now, expert? :) – Lin Ma Sep 08 '15 at 00:31
  • you are a charming expert. :) – Lin Ma Sep 08 '15 at 07:56
1

Try;

select 
    t.date_col, t.val_col,
    case when t1.val_col is null then 0
    else t.val_col - t1.val_col end diff
from (
    select t.* , @r := @r + 1 lev
    from tbl t,
    (select @r := 0) r
    order by t.date_col
) t
left join (
    select t.* , @r1 := @r1 + 1 lev
    from tbl t,
    (select @r1 := 1) r
    order by t.date_col 
) t1
on t.lev = t1.lev

This will calculate value diff even if there is a missing date

Praveen
  • 8,945
  • 4
  • 31
  • 49
  • where do you assign @r to be the value of column revenue? Thanks. – Lin Ma Sep 02 '15 at 08:06
  • 1
    Its just act as `row_number`.. for fist sub query starts with 1 and for second start with 2; joining them like `@r = @r2` will help `t.val_col - t1.val_col` to evaluate, where` t1.val_col` point to next col from`t.val_col` – Praveen Sep 02 '15 at 08:39
  • Hi Praveen, dumb question, why you write assignment statement ((select @r1 := 1) r) inside select-from-order by query itself? – Lin Ma Sep 02 '15 at 08:41
  • 1
    That is where `@r1` is initialized – Praveen Sep 02 '15 at 09:03
  • 1
    http://stackoverflow.com/questions/16715504/mysql-define-a-variable-within-select-and-use-it-within-the-same-select – Praveen Sep 02 '15 at 09:05
  • thanks for the reference and advice. I am not sure your implementation of writing (select @r := 0) r in the middle has the same effect of declare @r at the beginning as Jon's implementation? Thanks. – Lin Ma Sep 03 '15 at 03:34
  • I read comments from Jon Kloske, about not using assignment of a SQL variable in the same statement, I think your code has potential issues using assignment in the same statement? How do you think? Thanks. – Lin Ma Sep 04 '15 at 21:23