I have an awkward situation where I have a table storing data about changes to a value over time.
I have a column groupId
which groups together changes to a given value. I have value
, which stores what the value changed to, and I have date
which stores the date the change occurred.
e.g. if value a
comes into being on 20000101
and changes to b
on 20010101
, we might have something like:
+---------+-------+----------+
| groupId | value | date |
+---------+-------+----------+
| 42 | a | 20000101 |
| 42 | b | 20010101 |
+---------+-------+----------+
Now to make things fun, we can have records which don't represent a material change in value e.g.
+---------+-------+----------+
| groupId | value | date |
+---------+-------+----------+
| 43 | a | 20000101 |
| 43 | b | 20010101 |
| 43 | b | 20020101 |
+---------+-------+----------+
and for additional fun, we can have a value which changes to something else, then changes back to what it was before, like:
+---------+-------+----------+
| groupId | value | date |
+---------+-------+----------+
| 44 | a | 20000101 |
| 44 | b | 20010101 |
| 44 | a | 20020101 |
+---------+-------+----------+
Combining these together, we can have a group that looks something like this:
+---------+-------+----------+
| groupId | value | date |
+---------+-------+----------+
| 45 | a | 20000101 |
| 45 | a | 20010101 |
| 45 | b | 20020101 |
| 45 | b | 20030101 |
| 45 | a | 20040101 |
| 45 | a | 20050101 |
| 45 | b | 20060101 |
| 45 | b | 20070101 |
+---------+-------+----------+
What I need to do is to write a query which will return rows for each group, but throwing away any of those non-material changes. For group 45 above, this would mean returning:
+---------+-------+----------+
| groupId | value | date |
+---------+-------+----------+
| 45 | a | 20000101 |
| 45 | b | 20020101 |
| 45 | a | 20040101 |
| 45 | b | 20060101 |
+---------+-------+----------+
i.e. we keep only the earliest date for each 'contiguous' group of (groupId,, value).
Is there any reasonable way to achieve this?
I'm doing this in MySQL although a solution which doesn't depend on that would be ideal.