I have a table which is structured like the following:
ID Day Value1 Value2
10 20200601 ABC 100
10 20200602 ABC 100
10 20200603 CDE 200
10 20200604 CDE 100
20 20200601 ABC 50
20 20206002 ABC 100
20 20200603 ABC 100
20 20200604 ABC 100
Is there a way to build a SQL query which – per each ID
– looks for the Day
in which Value1
OR Value2
has changed?
The result I would like to achieve would be this:
ID Day Value1 Value2
10 20200603 ABC, CDE 100, 200
10 20200604 CDE 200, 100
20 20200602 ABC 50, 100
In which I can keep track of those changes per ID per Day.
Edit: I'm accessing this data on a Hadoop cluster via PySpark-SQL