Select the oldest record of a certain group until it changes pattern, in SQL

Question

I am trying to get the oldest record for every status update/change in the following table.

Table (status_updates) :

id	entity_id	status	date
7	2	Approved	2022-02-10
6	2	Approved	2022-02-05
5	2	Approved	2022-02-04
4	2	OnHold	2022-02-04
3	2	OnHold	2022-02-03
2	2	Approved	2022-02-02
1	2	Approved	2022-02-01

Result Needed :

id	entity_id	status	date
5	2	Approved	2022-02-04
3	2	OnHold	2022-02-03
1	2	Approved	2022-02-01

Tried :

select
    `status`,
    `created_at`
from
    `status_updates`
left join
    (select
        `id`,
        row_number() over (partition by status_updates.entity_id, status_updates.status order by status_updates.created_at asc) as sequence
    from
        `status_updates`)
    as `oldest_history`
    on
        `oldest_history`.`id` = `shipper_credit_histories`.`id`
where `sequence` = 1

Result Achived :

id	entity_id	status	date
3	2	OnHold	2022-02-03
1	2	Approved	2022-02-01

You can refer to this post and try rewriting your query. https://stackoverflow.com/questions/11127461/select-rows-where-column-value-has-changed — Nitin88, Feb 13 '23 at 22:26
the problem occurs that rows are by definition unsorted, and you need to give it an order, which you don't have, for an algorithm all approved belong together as they have no column that tell them which belong together — nbk, Feb 14 '23 at 00:06
Got your point. I am constantly getting stuck at the same point. There should be a way to group the rows just until they change the pattern, in my case the status when the data is ordered by date. — Tanmay, Feb 14 '23 at 00:13
I am open to change the schema of the table. Any Suggestions @nnichols — Tanmay, Feb 14 '23 at 00:25
Added 'id' as a primary key to the table. Please check now @nnichols — Tanmay, Feb 14 '23 at 00:43

score 1 · Accepted Answer · answered Feb 14 '23 at 09:50

1

Just using lag:

select s.*
from (
    select id, status<>coalesce(lag(status) over (partition by entity_id order by id),'') status_change
    from status_updates
) ids
join status_updates s using (id)
where status_change

answered Feb 14 '23 at 09:50

ysth

96,171
6
121
214

If the runs are short and the table narrow, you may benefit from ditching the join and returning all required columns in the inner select. Worth checking with your full dataset. – user1191247 Feb 14 '23 at 10:24

Alex G · Answer 2 · 2023-02-13T22:33:39.833

here are the queries:

create table status_updates
(entity_id integer,
status varchar(32),
date date
);

insert into status_updates values (2, 'Approved', '2022-02-05');
insert into status_updates values (2, 'Approved', '2022-02-04');
insert into status_updates values (2, 'On Hold', '2022-02-04');
insert into status_updates values (2, 'On Hold', '2022-02-03');
insert into status_updates values (2, 'Approved', '2022-02-02');
insert into status_updates values (2, 'Approved', '2022-02-01');

select b.*
from status_updates a
right join status_updates b 
on a.status=b.status and a.date=(b.date - interval 1 day) 
where a.entity_id is null;

or this query(if you prefer left join)

select a.*
from status_updates a
left join status_updates b
on a.status=b.status and a.date=(b.date + interval 1 day)
where b.entity_id is null;

in both you will see the expected result

Nice suggestion @"Alex G". The interval between the dates is not fixed in my case, it is indeterministic. Updated the Sample table above to show that. Please check. — Tanmay, Feb 14 '23 at 00:02

score 0 · Answer 3 · answered Feb 14 '23 at 06:53

0

the second solution is almost the same, but join by id instead of date

create table status_updates
(id integer,
entity_id integer,
 status varchar(32),
 date date
);

insert into status_updates values (7, 2, 'Approved', '2022-02-10');
insert into status_updates values (6, 2, 'Approved', '2022-02-05');
insert into status_updates values (5, 2, 'Approved', '2022-02-04');
insert into status_updates values (4, 2, 'On Hold', '2022-02-04');
insert into status_updates values (3, 2, 'On Hold', '2022-02-03');
insert into status_updates values (2, 2, 'Approved', '2022-02-02');
insert into status_updates values (1, 2, 'Approved', '2022-02-01');

select a.*
from status_updates a
         left join status_updates b
                   on a.status=b.status and a.id=b.id + 1
where b.entity_id is null;

result is the same what you expected

answered Feb 14 '23 at 06:53

Alex G

41
3

Unfortunately, you cannot rely on an auto-incrementing PK being contiguous as there are a number of situations that can lead to gaps. You can use a variation on this with another left join to check values between `a` and `b`. – user1191247 Feb 14 '23 at 08:35
sql returns exactly what is expected – Alex G Feb 15 '23 at 12:22
Only until you introduce another entity_id or have a hole in your PK sequence - [db<>fiddle](https://dbfiddle.uk/y3EHeSkR) – user1191247 Feb 15 '23 at 12:30

Select the oldest record of a certain group until it changes pattern, in SQL

Table (status_updates) :

Result Needed :

Tried :

Result Achived :

3 Answers3