I'm trying to kind of pivot a pandas dataframe, but with a "twist" I guess? So heres the input table (tsv) that gets loaded into pandas.
timestamp sensor type unit value
1607724078 sensor_a string gps coords1
1607724078 sensor_b int bar 1
1607724079 sensor_a string gps coords5
1607724079 sensor_b int bar 4
1607724080 sensor_a string gps coords9
1607724080 sensor_b int bar 7
The output should look like this
timestamp sensor_a sensor_a_type sensor_a_unit sensor_b sensor_b_type sensor_b_unit
1607724078 coords1 string gps 1 int bar
1607724079 coords5 string gps 4 int bar
1607724080 coords9 string gps 7 int bar
So as you see, it should be grouped by timestamp and the value of the sensor column has to be defined as a separate column with the value from the original value column. Furthermore the original type and unit columns have to be "name-merged" with the new sensor_x column and keep their values.
The goal is to save it as parquet. Important: there can be more than two sensors.
For now I accomplished to create a MultiIndexed dataframe that includes all the information I need but I can't write it to parquet because I guess the MultiIndex has to be flattend somehow, so that the sensor unit and type have their own separate column.
import pandas as pd
if __name__ == '__main__':
df = pd.read_csv("data/test.tsv", delimiter='\t', index_col='timestamp')
df2 = df.pivot(columns=['sensor', 'unit', 'type'], values='value')
print(df2)
Output
sensor sensor_a sensor_b sensor_c
unit gps bar percent
type string int double
timestamp
1607724078 coords1 1 11.11
1607724079 coords5 4 44.44
1607724080 coords9 7 77.77
Thanks in advance!