I'm trying to aggregate rows in a df if two consecutive rows take the same value. Here is a sample of the dataframe:
df = pd.DataFrame({
'latitude': [0.0, 0.0, 1.0,1.0, 1.0],
'longitude': [0.0, 1.0, 1.0,2.0, 2.0],
'hour': [0, 1, 1, 2, 3]
})
First, I want to aggregate consecutive rows if the 'latitude' and 'longitude' values of them are the same, and use, for the 'hour' column the max as an aggregation function. In that case, the expected output, using the sample dataframe would be:
df = pd.DataFrame({
'latitude': [0.0, 0.0, 1.0, 1.0],
'longitude': [0.0, 1.0, 1.0, 2.0],
'hour': [0, 1, 1, 3]
})
After performing this aggregation, I would like to aggregate based on the 'hour' column, e.g., I would like to aggregate two consecutive rows if their 'hour' attribute is the same (and use 'first' as an agg function for latitude and longitude).
In that case the final expected output would be
df = pd.DataFrame({
'latitude': [0.0, 0.0, 1.0],
'longitude': [0.0, 1.0, 2.0],
'hour': [0, 1, 3]
})
I know how to do this by iterating over the rows, but I don't think that is the best practice, since it takes a lot of time. So I was looking for a different approach.
If anyone could help, I would be grateful. Thanks in advance!