-1

I am trying to get the oldest record for every status update/change in the following table.

Table (status_updates) :

id entity_id status date
7 2 Approved 2022-02-10
6 2 Approved 2022-02-05
5 2 Approved 2022-02-04
4 2 OnHold 2022-02-04
3 2 OnHold 2022-02-03
2 2 Approved 2022-02-02
1 2 Approved 2022-02-01

Result Needed :

id entity_id status date
5 2 Approved 2022-02-04
3 2 OnHold 2022-02-03
1 2 Approved 2022-02-01

Tried :

select
    `status`,
    `created_at`
from
    `status_updates`
left join
    (select
        `id`,
        row_number() over (partition by status_updates.entity_id, status_updates.status order by status_updates.created_at asc) as sequence
    from
        `status_updates`)
    as `oldest_history`
    on
        `oldest_history`.`id` = `shipper_credit_histories`.`id`
where `sequence` = 1

Result Achived :

id entity_id status date
3 2 OnHold 2022-02-03
1 2 Approved 2022-02-01
Tanmay
  • 13
  • 2
  • 2
    and what is your quetion? – nbk Feb 13 '23 at 22:21
  • You can refer to this post and try rewriting your query. https://stackoverflow.com/questions/11127461/select-rows-where-column-value-has-changed – Nitin88 Feb 13 '23 at 22:26
  • Sure, let me try to elaborate the question. – Tanmay Feb 13 '23 at 23:55
  • Updated the problem statement, please check now @nbk – Tanmay Feb 13 '23 at 23:58
  • the problem occurs that rows are by definition unsorted, and you need to give it an order, which you don't have, for an algorithm all approved belong together as they have no column that tell them which belong together – nbk Feb 14 '23 at 00:06
  • Got your point. I am constantly getting stuck at the same point. There should be a way to group the rows just until they change the pattern, in my case the status when the data is ordered by date. – Tanmay Feb 14 '23 at 00:13
  • I am open to change the schema of the table. Any Suggestions @nnichols – Tanmay Feb 14 '23 at 00:25
  • Added 'id' as a primary key to the table. Please check now @nnichols – Tanmay Feb 14 '23 at 00:43

3 Answers3

1

Just using lag:

select s.*
from (
    select id, status<>coalesce(lag(status) over (partition by entity_id order by id),'') status_change
    from status_updates
) ids
join status_updates s using (id)
where status_change
ysth
  • 96,171
  • 6
  • 121
  • 214
  • If the runs are short and the table narrow, you may benefit from ditching the join and returning all required columns in the inner select. Worth checking with your full dataset. – user1191247 Feb 14 '23 at 10:24
0

here are the queries:

create table status_updates
(entity_id integer,
status varchar(32),
date date
);

insert into status_updates values (2, 'Approved', '2022-02-05');
insert into status_updates values (2, 'Approved', '2022-02-04');
insert into status_updates values (2, 'On Hold', '2022-02-04');
insert into status_updates values (2, 'On Hold', '2022-02-03');
insert into status_updates values (2, 'Approved', '2022-02-02');
insert into status_updates values (2, 'Approved', '2022-02-01');

select b.*
from status_updates a
right join status_updates b 
on a.status=b.status and a.date=(b.date - interval 1 day) 
where a.entity_id is null;

or this query(if you prefer left join)

select a.*
from status_updates a
left join status_updates b
on a.status=b.status and a.date=(b.date + interval 1 day)
where b.entity_id is null;

in both you will see the expected result

Alex G
  • 41
  • 3
  • Nice suggestion @"Alex G". The interval between the dates is not fixed in my case, it is indeterministic. Updated the Sample table above to show that. Please check. – Tanmay Feb 14 '23 at 00:02
0

the second solution is almost the same, but join by id instead of date

create table status_updates
(id integer,
entity_id integer,
 status varchar(32),
 date date
);

insert into status_updates values (7, 2, 'Approved', '2022-02-10');
insert into status_updates values (6, 2, 'Approved', '2022-02-05');
insert into status_updates values (5, 2, 'Approved', '2022-02-04');
insert into status_updates values (4, 2, 'On Hold', '2022-02-04');
insert into status_updates values (3, 2, 'On Hold', '2022-02-03');
insert into status_updates values (2, 2, 'Approved', '2022-02-02');
insert into status_updates values (1, 2, 'Approved', '2022-02-01');

select a.*
from status_updates a
         left join status_updates b
                   on a.status=b.status and a.id=b.id + 1
where b.entity_id is null;

result is the same what you expected

Alex G
  • 41
  • 3
  • Unfortunately, you cannot rely on an auto-incrementing PK being contiguous as there are a number of situations that can lead to gaps. You can use a variation on this with another left join to check values between `a` and `b`. – user1191247 Feb 14 '23 at 08:35
  • sql returns exactly what is expected – Alex G Feb 15 '23 at 12:22
  • Only until you introduce another entity_id or have a hole in your PK sequence - [db<>fiddle](https://dbfiddle.uk/y3EHeSkR) – user1191247 Feb 15 '23 at 12:30