0

I have the following query that is using one table to find the revenue column for a list of account numbers closest to either end of a date range, i.e. the row closest to the start of a date range, and the one closest to the end.

I've came up with the following but it only seems to work for the first account number and not any others after that.

SELECT a.acct_no,
b.revenue AS first_value, b.created_at AS first_date,
c.revenue AS second_value, c.created_at AS second_date

FROM `revenue_figures` a

LEFT JOIN (
    SELECT `acct_no`, `revenue`, `created_at`
    FROM  `revenue_figures`
    WHERE `created_at` BETWEEN '2017-02-28 00:00:00' AND '2017-03-31 23:59:59'
    AND `acct_no` IN ('A123', 'A124')
    GROUP BY `acct_no`, `revenue`, `created_at`
    ORDER BY `created_at` ASC
    LIMIT 1
) b ON a.acct_no = b.acct_no

LEFT JOIN (
    SELECT `acct_no`, `revenue`, `created_at`
    FROM  `revenue_figures`
    WHERE `created_at` BETWEEN '2017-02-28 00:00:00' AND '2017-03-31 23:59:59'
    AND `acct_no` IN ('A123', 'A124')
    GROUP BY `acct_no`, `revenue`, `created_at`
    ORDER BY `created_at` DESC
    LIMIT 1
) c ON a.acct_no = c.acct_no

WHERE a.`created_at` BETWEEN '2017-02-28 00:00:00' AND '2017-03-31 
23:59:59'
AND a.`acct_no` IN ('A123', 'A124')

GROUP BY a.acct_no, first_value, first_date, second_value, second_date

This produces the following, but there are figures in the table for the A124 account number:

acct_no first_value first_date  second_value    second_date
A123    8990.00     2017-02-28  7364.80         2017-03-31 01:00:00
A124    NULL        NULL        NULL            NULL

I think there is a fundamental flaw with the query as I noticed that if I run it a few times, I can generate different results, such as:

acct_no first_value first_date  second_value    second_date
A123    8990.00     2017-02-28  NULL            NULL
A124    NULL        NULL        28361.76        2017-03-31 01:00:00

Table Structure

id, acct_no, revenue, created_at

Example Data

1, A123, 8990.00, 2017-02-28 01:00:00
2, A123, 7364.80, 2017-03-31 01:00:00
3, A124, 17324.12, 2017-02-28 01:00:00
4, A124, 28361.76, 2017-03-31 01:00:00
martincarlin87
  • 10,848
  • 24
  • 98
  • 145
  • You should share your table structure(s) and example data... we can't answer your question with only a query. – Raymond Nijland Sep 20 '17 at 14:58
  • Hi Raymond, added some example data, not actual `CREATE` or `INSERT` statements though. – martincarlin87 Sep 20 '17 at 15:04
  • `LIMIT 1` means you only get one row in each subquery. If the row is for `A123`, you won't get any data for `A124`. It's not one row per account. – Barmar Sep 20 '17 at 15:05
  • If you want to get the oldest and newest rows for each account, see https://stackoverflow.com/questions/7745609/sql-select-only-rows-with-max-value-on-a-column?rq=1 – Barmar Sep 20 '17 at 15:08
  • thanks, will have a look. If I take out the `LIMIT 1` from the subqueries, I then end up with a result for every day per account number within the date range which is the problem. – martincarlin87 Sep 20 '17 at 15:09
  • Get rid of the _inner_ `GROUP BYs` -- it is doing nothing useful, and it is slowing the query down. – Rick James Sep 20 '17 at 18:25
  • Don't call then "second_value" and "second_date"; call them "last_value" and "last_date". – Rick James Sep 20 '17 at 22:51

1 Answers1

1

(Reopened because the OP's code handles what the "dup" question discussed. The real question is about another issue.)

The problem is that you ask for the latest of two acct_nos, but then LIMIT 1, which throws out one of them. Let's start by focusing on one subquery.

Am I correct in saying that you want 2 rows, one per acct_no? To do so, you need to look for 1 row for each acct_no. And probably the only way to do that is with a UNION ALL of two SELECT ... ORDER BY ... LIMIT 1. No GROUP BY.

So, now you have

LEFT JOIN
(
    ( SELECT ... acct_no = 'A123' ... LIMIT 1 )
    UNION ALL
    ( SELECT ... acct_no = 'A124' ... LIMIT 1 )
)

First debut the UNION to see if it gets the desire pair of rows.

Then put it into each LEFT JOIN (though I don't know if you need LEFT) to see if the big query comes out OK.

Some other tips...

INDEX(acct_no, created_at)

Use this pattern:

    created_at >= '2017-03-01'
AND created_at  < '2017-03-01' + INTERVAL 1 MONTH

What you have includes the last day of Feb, plus all of Mar; was that deliberate? And, is created_at a DATETIME? Or some other datatype?

The outer GROUP BY should be just

GROUP BY acct_no
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Hi Rick, thanks for re-opening and for the answer. `created_at` is a `TIMESTAMP` column. The dates were intentional as it's a date range entered from the front-end, so the plan was to find the rows closest to either end of the date range. Will give this a try in work tomorrow, managed to get it working by querying for the exact date but would rather have it work as originally intended. – martincarlin87 Sep 20 '17 at 19:37