i have a pandas df that includes columns of sensor measurements were each row contains the sensor measurements of one unique sensor node. The order of these rows from the sensor nodes looks like this:
{{0: 'sensornode0009', 1: 'sensornode0015', 2: 'sensornode0011', 3: 'sensornode0012', 4: 'sensornode0016', 5: 'sensornode0014', 6: 'sensornode0013', 7: 'sensornode0008', 8: 'sensornode0010', 9: 'sensornode0009', 10: 'sensornode0015', 11: 'sensornode0011', 12: 'sensornode0012', 13: 'sensornode0016', 14: 'sensornode0014', 15: 'sensornode0013', 16: 'sensornode0008', 17: 'sensornode0010', 18: 'sensornode0009', 19: 'sensornode0015', 20: 'sensornode0011', 21: 'sensornode0012', 22: 'sensornode0016', 23: 'sensornode0014', 24: 'sensornode0013', 25: 'sensornode0008', 26: 'sensornode0010', 27: 'sensornode0009', 28: 'sensornode0015', 29: 'sensornode0011'}}
So there are 8 unique sensor nodes each sending the same measurements but from a different location. As can be seen in the data, sensornode0009 sent it´s values first, followed by the rest of the sensor nodes. After all unique sensor nodes occoured, sensornode0009 occours again. I call this a "chunk" of 10 seconds interval length within all sensor nodes sendet their data for one time. Due to small technically issues, some of the sensor nodes didn´t send their data within a 10 s - chunk.
I want to identify the rows where one or more sensor nodes are missing within one chunk and want to add a copy of the latest row were they sent the data correctly. As a result, i want to have a row of measurements from all unique sensor nodes within every 10s chunk.
I´ve tryed the following code to achieve this:
# Find the indices where the series starts (when 'sensornode_0009' occurs)
start_indices = df[df['Sensor_ID'] == 'sensornode0009'].index.tolist()
# Iterate through the series
for i in range(len(start_indices) - 1):
start_idx = start_indices[i] # Start index of the series
end_idx = start_indices[i + 1] # End index of the series
# Get the unique names within the actual series
names = df.loc[start_idx:end_idx, 'Sensor_ID'].unique()
# Generate a list of expected names
expected_names = df['Sensor_ID'].unique()
# Check for missing names within the series
missing_names = set(expected_names) - set(names)
if missing_names:
for missing_name in missing_names:
# Find the latest row before the missing sensor node occurred
last_row_idx = df[df['Sensor_ID'] == missing_name].index.max()
last_row = df.loc[last_row_idx]
# Copy the last row to the series where the sensor node is missing
df.loc[end_idx, :] = last_row.values
Thsi code finds the indices of rows where sensornodes are missing but the filling of missing values doesnt work as expected.
For now I´ve used the information that the chunk always starts with "sensornode0009". Is there a simpler way to achieve the desired output?