3

I have two pandas dataframes, both index with datetime entries. The df1 has non-unique time indices, whereas df2 has unique ones. I would like to add a column df2.a to df1 in the following way: for every row in df1 with timestamp ts, df1.a should contain the most recent value of df2.a whose timestamp is less then ts.

For example, let's say that df2 is sampled every minute, and there are rows with timestamps 08:00:15, 08:00:47, 08:02:35 in df1. In this case I would like the value from df2.a[08:00:00] to be used for the first two rows, and df2.a[08:02:00] for the third. How can I do this?

SBF
  • 345
  • 1
  • 2
  • 14

2 Answers2

2

You are describing an asof-join, which was just released in pandas 0.19.

pd.merge(df1, df2, left_on='ts', right_on='a')
Community
  • 1
  • 1
chrisaycock
  • 36,470
  • 14
  • 88
  • 125
1

apply to rows of df1, reindex on df2 with ffill.

df1['df2.a'] = df1.apply(lambda x: pd.Series(df2.a.reindex([x.name]).ffill().values), axis=1)
piRSquared
  • 285,575
  • 57
  • 475
  • 624