I am trying to create a version history for a software installation based on diagnostic data received on certain dates. The data is in a PostgreSQL database:
SELECT version();
version
-------------------------------------------------------------------------------------------------------
PostgreSQL 10.14 on x86_64-pc-linux-gnu, compiled by x86_64-unknown-linux-gnu-gcc (GCC) 4.9.4, 64-bit
The table's schema is like this:
CREATE TABLE cluster_info (
cluster_id uuid,
date timestamp,
version text,
PRIMARY KEY (cluster_id, date)
);
The relevant data looks like this:
select cluster_id, version, date
from cluster_info
where cluster_id = 'e2865aec-0ce1-11ec-afda-0242c0a8a003'
order by date;
cluster_id | date | version
--------------------------------------+---------------------+--------------
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-03-15 10:30:47 | 6.0.5
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-05-03 20:32:33 | 6.0.5
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-05-08 14:57:05 | 6.0.7
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-05-20 16:59:45 | 6.0.7
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-05-21 00:21:43 | 6.0.5, 6.0.7
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-05-21 18:45:45 | 6.0.5, 6.0.7
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-05-22 20:05:10 | 6.0.5, 6.0.6
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-05-23 11:54:39 | 6.0.5, 6.0.6
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-05-24 15:01:09 | 6.0.7
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-05-24 19:21:14 | 6.0.7
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-05-28 20:06:29 | 6.0.6
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-07-09 05:20:32 | 6.0.6
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-07-11 12:05:03 | 6.0.8
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-07-17 17:46:10 | 6.0.8
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-07-24 14:44:55 | 6.0.6
e2865aec-0ce1-11ec-afda-0242c0a8a003 | 2019-07-26 14:54:33 | 6.0.6
My first instinct would be to use min
and max
with group by
, but it's possible that a cluster could be downgraded to a previous version after being upgraded. In that case I would want to show separate timespans for each period when the cluster was on a given version, and group by
wouldn't accomplish this.
I attempted to use min
and max
window functions partitioned by version, which did not work as I expected either:
select distinct * from (select
version,
min(date) over (partition by version),
max(date) over (partition by version)
from cluster_info
where cluster_id = 'e2865aec-0ce1-11ec-afda-0242c0a8a003'
order by date) x;
version | min | max
--------------+---------------------+---------------------
6.0.5 | 2019-03-15 10:30:47 | 2019-05-03 20:32:33
6.0.5, 6.0.6 | 2019-05-22 20:05:10 | 2019-05-23 11:54:39
6.0.5, 6.0.7 | 2019-05-21 00:21:43 | 2019-05-21 18:45:45
6.0.6 | 2019-05-28 20:06:29 | 2019-07-26 14:54:33
6.0.7 | 2019-05-08 14:57:05 | 2019-05-24 19:21:14
6.0.8 | 2019-07-11 12:05:03 | 2019-07-17 17:46:10
What is the correct way to do this?
Edit: updated to include version and schema, and use an example dataset that exhibits the downgrade problem and shows that my initial solution was incorrect.