0

I have a MySQL DB which is currently appending new data from CSV files. Some of the rows are duplicates apart from a few columns (i.e last_status).

I would like to filter out the whole DB by only capturing the latest record based on MAX(last_status).

Assuming I have 4 rows with similar ID:

ID, last_status,....(50 other columns)
100, 08/08/2020,...
100, 09/08/2020,...
200, 09/08/2020,...
200, 11/08/2020,...

I expect the result to be like this for all records:

ID, last_status,...(50 other columns)
100, 09/08/2020,...
200, 11/08/2020,...

I have tried this:

    select *
from total_report s1
inner join
(
  select MAX(last_status) as last_status, ID
  from htotal_report
  group by ID
) s2
  on s1.ID = s2.ID
  and s1.last_status = s2.last_status

The reason I need this is to visualize the data in Tableau and now it is counting the duplicated ID rows separately. I would like to create a View based on the question that could be used in Tableau to solve this.

Edit: ID is a unique ID, not an auto-incremented ID, nor a PK or FK.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828

1 Answers1

0

If I understand correctly, will not just the inner part give what you need?

 select MAX(last_status) as last_status, ID
  from htotal_report
  group by ID
Aris
  • 4,643
  • 1
  • 41
  • 38