I have datastream containing ID, type, and value: For a group of users with given ID I receive measurements (values) from different sensors (type). Example of incoming data:
ID type value
1 A 70
2 B 16
1 A 71
2 A 72
I need to create Spark Structured Streaming app that will perform custom clustering of the obtained data. However, I am stuck at the begining> I don't know how to create a set of data that will contain the last measurements for each user for each type. I need to have this set for every user that has ever appeared in the system.
So, basically, for a data stream described above, I need a Structured Streaming app that will give me a set of last measurements for every user for every type>
ID type value
1 A 71
2 B 16
2 A 72
Users may be inactive for some time, I still need to keep their record. It would be useful if the output is a dataframe.
Any ideas for how to do this will be very welcome.
PS I am fairly new to Spark Structured Streaming, sorry if this is a trivial question.