I have this dataset available at: https://gitlab.com/creativitylabb/dataset-test/-/raw/main/final_pagination.csv where the data looks like this after processing:
TimeStamp Source Sensor Value LocationLat LocationLong Measurement
TimeStamp
01.02.2021 07:00:00 01.02.2021 07:00:00 Waqi pm10 16.0 45.716700 25.633300 µg/m3
01.02.2021 07:00:00 01.02.2021 07:00:00 Waqi no2 4.0 45.716700 25.633300 µg/m3
01.02.2021 07:00:00 01.02.2021 07:00:00 Waqi no2 2.3 45.716700 25.633300 µg/m3
01.02.2021 07:00:00 01.02.2021 07:00:00 Waqi o3 19.8 45.716700 25.633300 µg/m3
01.02.2021 08:00:00 01.02.2021 08:00:00 Waqi no2 28.5 45.659833 25.614488 µg/m3
The processing I used:
from datetime import datetime
import pandas as pd
df = pd.read_csv('https://gitlab.com/creativitylabb/dataset-test/-/raw/main/final_pagination.csv')
df = df.drop(['id', 'index', 'type', 'score', 'Unnamed: 0'], 1)
df['TimeStamp'] = df['TimeStamp'].apply(lambda x: datetime.utcfromtimestamp(x / 1000).strftime('%d.%m.%Y %H:%M:%S'))
df = df.sort_values(by='TimeStamp').reset_index(drop=True)
print(df.head().to_string())
df.index = df['TimeStamp']
The Sensor value contains sensors like pm10, pm2.5, co2 and so on. The Value column contains the sensor's measured value. How could I split the data into other columns, so that I can have a column with pm10 values, another with pm2.5 values and so on? (preferably without having all the other columns Nan)
Example output:
TimeStamp Source pm10 pm25 LocationLat LocationLong Measurement
TimeStamp
01.02.2021 07:00:00 01.02.2021 07:00:00 Waqi 16.0 20 45.716700 25.633300 µg/m3
01.02.2021 07:00:00 01.02.2021 07:00:00 Waqi 4.0 21 45.716700 25.633300 µg/m3
01.02.2021 07:00:00 01.02.2021 07:00:00 Waqi 2.3 20 45.716700 25.633300 µg/m3
01.02.2021 07:00:00 01.02.2021 07:00:00 Waqi 19.8 25 45.716700 25.633300 µg/m3
01.02.2021 08:00:00 01.02.2021 08:00:00 Waqi 28.5 24 45.659833 25.614488 µg/m3