Imagine you got a Dataframe containing value observations of variables. Each observation is saved as a triple Variable, Timestamp, Value. This layout is somewhat a "observation dataframe".
#Variable Time Value
#852-YF-007 2016-05-10 23:00:00 4
#852-YF-007 2016-05-11 04:00:00 4
#...
#852-YF-008 2016-05-10 23:00:00 5
#852-YF-008 2016-05-11 04:00:00 3
#...
#852-YF-009 2016-05-10 23:00:00 2
#852-YF-009 2016-05-11 04:00:00 9
#...
That data is loaded into a Spark Dataframe and the timestamps are sampled so that we have one value for each variable for a specific timestamp.
Question: How can I convert/transpose that efficiently into a "Instants Dataframe" like this:
#Time 852-YF-007 852-YF-008 852-YF-009
#2016-05-10 23:00:00 4 5 2
#2016-05-11 04:00:00 4 3 9
#...
The number of columns depends on the number of variables. Each column is the timeseries (all sampled values for that variables) while the rows are the timestamps. Note: the number of timestamps will be much larger than the number of variables.
Update: It's related to pivot-tables but I do not have a fixed number of columns. That number varies by the number of variables.