2

I am running PostgreSQL 9.3 and have a table that looks something like this:

     entry_date      | account_id | balance
---------------------+------------+---------
 2016-02-01 00:00:00 |        123 |     100
 2016-02-01 06:00:00 |        123 |     200
 2016-02-01 12:00:00 |        123 |     300
 2016-02-01 18:00:00 |        123 |     250
 2016-02-01 00:00:00 |        456 |     400
 2016-02-01 06:00:00 |        456 |     300
 2016-02-01 12:00:00 |        456 |     200
 2016-02-01 18:00:00 |        456 |     299
 2016-02-02 00:00:00 |        123 |     250
 2016-02-02 06:00:00 |        123 |     300
 2016-02-02 12:00:00 |        123 |     400
 2016-02-02 18:00:00 |        123 |     450
 2016-02-02 00:00:00 |        456 |     299
 2016-02-02 06:00:00 |        456 |     200
 2016-02-02 12:00:00 |        456 |     100
 2016-02-02 18:00:00 |        456 |       0
(16 rows)

My goal is to retrieve the final balance for each account, each day in a given date range. So my desired result is:

     entry_date      | account_id | balance
---------------------+------------+---------
 2016-02-01 18:00:00 |        123 |     250
 2016-02-01 18:00:00 |        456 |     299
 2016-02-02 18:00:00 |        123 |     450
 2016-02-02 18:00:00 |        456 |       0
(4 rows)

Note that the timestamps in my example are much neater than in reality...I can't always rely on 18:00 as the last time of each day.

How would I write this SQL query?

I tried variations of this:

SELECT max(entry_date), account_id, max(balance)
FROM ledger
WHERE entry_date BETWEEN '2016-02-01'::timestamp AND '2016-02-02'::timestamp
GROUP BY account_id, entry_date;

Here is the schema:

CREATE TABLE ledger (
  entry_date    timestamp(3),
  account_id    int,
  balance       int
);

INSERT INTO ledger VALUES ('2016-02-01T00:00:00.000Z', 123, 100);
INSERT INTO ledger VALUES ('2016-02-01T06:00:00.000Z', 123, 200);
INSERT INTO ledger VALUES ('2016-02-01T12:00:00.000Z', 123, 300);
INSERT INTO ledger VALUES ('2016-02-01T18:00:00.000Z', 123, 250);

INSERT INTO ledger VALUES ('2016-02-01T00:00:00.000Z', 456, 400);
INSERT INTO ledger VALUES ('2016-02-01T06:00:00.000Z', 456, 300);
INSERT INTO ledger VALUES ('2016-02-01T12:00:00.000Z', 456, 200);
INSERT INTO ledger VALUES ('2016-02-01T18:00:00.000Z', 456, 299);

INSERT INTO ledger VALUES ('2016-02-02T00:00:00.000Z', 123, 250);
INSERT INTO ledger VALUES ('2016-02-02T06:00:00.000Z', 123, 300);
INSERT INTO ledger VALUES ('2016-02-02T12:00:00.000Z', 123, 400);
INSERT INTO ledger VALUES ('2016-02-02T18:00:00.000Z', 123, 450);

INSERT INTO ledger VALUES ('2016-02-02T00:00:00.000Z', 456, 299);
INSERT INTO ledger VALUES ('2016-02-02T06:00:00.000Z', 456, 200);
INSERT INTO ledger VALUES ('2016-02-02T12:00:00.000Z', 456, 100);
INSERT INTO ledger VALUES ('2016-02-02T18:00:00.000Z', 456, 0);

Here is a SQL Fiddle: http://sqlfiddle.com/#!15/56886

Thanks in advance!

Vao Tsun
  • 47,234
  • 13
  • 100
  • 132
Shaun Scovil
  • 3,905
  • 5
  • 39
  • 58

2 Answers2

2

You can use ROW_NUMBER with PARTITION BY:

SELECT entry_date, account_id, balance
FROM (
  SELECT entry_date, account_id, balance, 
         ROW_NUMBER() OVER (PARTITION BY account_id, entry_date::date 
                            ORDER BY entry_date DESC) AS rn
  FROM ledger
  WHERE entry_date BETWEEN '2016-02-01'::timestamp AND '2016-02-02'::timestamp) AS t
WHERE t.rn = 1

PARTITION BY creates slices of account_id values per day since entry_date is also used in the same clause after being cast to a date value. Each slice is ordered by entry_date in descending order, hence ROW_NUMBER = 1 corresponds to the last record of the day.

Demo here

Giorgos Betsos
  • 71,379
  • 9
  • 63
  • 98
1

In Postgres, I think the easiest method is distinct on:

SELECT DISTINCT ON (account_id) l.*
FROM ledger l
WHERE entry_date BETWEEN '2016-02-01'::timestamp AND '2016-02-02'::timestamp
ORDER BY account_id, entry_date DESC;

DISTINCT ON sorts the data based on the keys in the ORDER BY. It then chooses unique values of the keys in the ON list, choosing the first value encountered.

EDIT:

Exactly the same idea applies for one record for day -- I simply misread the original requirement:

SELECT DISTINCT ON (account_id, date_trunc('day', entry_date)) l.*
FROM ledger l
WHERE entry_date BETWEEN '2016-02-01'::timestamp AND '2016-02-02'::timestamp
ORDER BY account_id, date_trunc('day', entry_date), entry_date DESC;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I dig how concise this is, but unfortunately it only returns rows for the last day in the date range. I need a row for each account on each day in the date range (expecting four rows, got two). – Shaun Scovil Feb 04 '16 at 18:08