I have a table which records the chlorine samples in distribution for a water company at different sites. To start I have the following sub-query:
SELECT CONCAT(MIN(in_dist.chlorine_mgl), ' - ', MAX(in_dist.chlorine_mgl))
FROM gis_user.BACTISURVEY AS in_dist
INNER JOIN (SELECT MAX(last_edited_date ) AS max_date, collection_location, sample_type, date_sample_collected
FROM gis_user.BACTISURVEY
GROUP BY date_sample_collected, collection_location, sample_type ) AS tbl ON tbl.collection_location = in_dist.collection_location
AND tbl.date_sample_collected = in_dist.date_sample_collected
WHERE samples.district_distribution = in_dist.district_distribution
This subquery has a few conditions:
- So for these sites in my application you have the option to edit the entry, but the database keeps an archive of these edits in the same table, distinguished by 'edit date', so naturally I need to take the latest edit based on datetime (I'm expecting a lot of edits because of the nature of the program)
- I want to take the minimum chlorine sample value and the maximum chlorine value for a group (This group being district_distribution AND ALSO date_sample_collected)
So basically I want the following:
DISTRICT CHLORINE DATE COLLECTED
district1 1.2 - 2.6 March 30th 2020
district1 1.4 - 3.2 April 1st 2020
district2 1.6 - 2.4 March 30th 2020
But I'm getting something like the following:
DISTRICT CHLORINE DATE COLLECTED
district1 1.2 - 3.2 March 30th 2020
district1 1.2 - 3.2 April 1st 2020
district2 1.6 - 2.4 March 30th 2020
So what that shows me is that it's taking all the chlorine samples regardless of date within the district_distribution group and taking the minimum and maximum value of those. I want it to differentiate between the collection date as well as the district.
EDIT --
For the below answer, would a case like this also be covered by your query?:
-- Unique Entry 1 (district 1, Type A, AUS, 30th Mar)
('district 1', 'type A', 'AUS', '2020-02-30', '2020-04-01', 1.2) --Another row expected min
,('district 1', 'type A', 'AUS', '2020-02-30', '2020-04-01', 2.8) --Another row expected max
,('district 1', 'type A', 'AUS', '2020-03-30', '2020-04-03', 1.1)
,('district 1', 'type A', 'AUS', '2020-03-30', '2020-04-03', 5.2)
,('district 1', 'type A', 'AUS', '2020-03-30', '2020-04-05', 1.21) --Expected min
,('district 1', 'type A', 'AUS', '2020-03-30', '2020-04-05', 1.44) --Expected max
EDIT2: So I used your partition and rank suggestion and that seems to be the trick! I thank you! Marked Answer below.