I would like to efficiently merge two data frames into one, but one data frame has "more data" than the other. Example:
df_A = pd.DataFrame({"Time": [pd.to_datetime("09:11:37.600"),
pd.to_datetime("09:11:37.700"),
pd.to_datetime("09:11:37.800")],
"A": [0.1, 0.7, -1.1]})
df_B = pd.DataFrame({"Time": [pd.to_datetime("09:11:37.610"),
pd.to_datetime("09:11:37.640"),
pd.to_datetime("09:11:37.670"),
pd.to_datetime("09:11:37.700"),
pd.to_datetime("09:11:37.730"),
pd.to_datetime("09:11:37.760"),
pd.to_datetime("09:11:37.790"),
pd.to_datetime("09:11:37.820")],
"B": [0.3, -1.5, -0.5, 0.2, 1.2, -0.9, 0.1, -0.2]})
I would like to create a third data frame, df_C
, that is an augmented copy of the shortest previous data frame. The augmentation is given by the "smoothed" columns from the longest data frame. The smoothing can be performed by an averaging operation, or another one. In this example, I would like to do the following:
- For the row corresponding to
pd.to_datetime("09:11:37.600")
, average out the values0.3, -1.5, -0.5
because they correspond to the times betweenpd.to_datetime("09:11:37.600")
and the time in the next row,pd.to_datetime("09:11:37.700")
; - For the corresponding to
pd.to_datetime("09:11:37.700")
, average out the values0.2, 1.2, -0.9, 0.1
because they correspond to the times betweenpd.to_datetime("09:11:37.700")
and the time in the next row,pd.to_datetime("09:11:37.800")
; - And so on.
The data frame df_C
would then have three columns: Time, A, and B, where Time and A come from df_A
, and B is the "smoothed" column from df_B
following the procedure above.
Is there a way to do this without explicitly writing a for
loop, which can be expensive for very long data frames?
I tried the following, but it's copying the same value to all dates (i.e., it's not correct).
df_C = df_A.copy()
df_C.loc[:, "B"] = df_B.loc[(df_B["Time"] >= df_A.shift(1)["Time"].values[1]) & (df_B["Time"] < df_A.shift(-1)["Time"].values[0]), "B"].mean()