1

I am trying to obtain the closest previous data point every hour in a pandas data frame. For example:

        time  value
0   14:59:58     15
1   15:00:10     20
2   15:57:42     14
3   16:00:30      9

would return

        time  value
0   15:00:00     15
1   16:00:00     14

i.e. rows 0 and 2 of the original data frame. How would I go about doing so? Thanks!

KidSudi
  • 460
  • 2
  • 7
  • 19

1 Answers1

0

With the following toy dataframe:

import pandas as pd

df = pd.DataFrame(
    {"time": ["14:59:58", "15:00:10", "15:57:42", "16:00:30"], "value": [15, 20, 14, 9]}
)

Here is one way to do it:

# Setup
df["time"] = pd.to_datetime(df["time"], format="%H:%M:%S")
temp_df = pd.DataFrame(df["time"].dt.round("H").drop_duplicates()).assign(value=pd.NA)

# Add round hours to df, find nearest data points and drop previous hours
new_df = (
    pd.concat([df, temp_df])
    .sort_values(by="time")
    .fillna(method="ffill")
    .pipe(lambda df_: df_[~df_["time"].isin(df["time"])])
    .reset_index(drop=True)
)

# Cleanup
new_df["time"] = new_df["time"].dt.time
print(new_df)
# Output
       time  value
0  15:00:00     15
1  16:00:00     14
Laurent
  • 12,287
  • 7
  • 21
  • 37