I have a problem that involves converting time series from one representation to another. Each item in the time series has attributes "time", "id", and "value" (think of it as a measurement at "time" for sensor "id"). I'm storing all the items in a Pandas dataframe with columns named by the attributes.
The set of "time"s is a small set of integers (say, 32), but some of the "id"s are missing "time"s/"value"s. What I want to construct is an output dataframe with the form:
id time0 time1 ... timeN
val0 val1 ... valN
where the missing "value"s are represented by NaNs.
For example, suppose the input looks like the following:
time id value
0 0 13
2 0 15
3 0 20
2 1 10
3 1 12
Then, assuming the set of possible times is 0, 2, and 3, the desired output is:
id time0 time1 time2 time3
0 13 NaN 15 20
1 NaN NaN 10 12
I'm looking for a Pythonic way to do this since there are several million rows in the input and around 1/4 million groups.