Pandas: add column with the most recent values

Question

I have two pandas dataframes, both index with datetime entries. The df1 has non-unique time indices, whereas df2 has unique ones. I would like to add a column df2.a to df1 in the following way: for every row in df1 with timestamp ts, df1.a should contain the most recent value of df2.a whose timestamp is less then ts.

For example, let's say that df2 is sampled every minute, and there are rows with timestamps 08:00:15, 08:00:47, 08:02:35 in df1. In this case I would like the value from df2.a[08:00:00] to be used for the first two rows, and df2.a[08:02:00] for the third. How can I do this?

score 2 · Answer 1 · edited May 23 '17 at 11:53

2

You are describing an asof-join, which was just released in pandas 0.19.

pd.merge(df1, df2, left_on='ts', right_on='a')

edited May 23 '17 at 11:53

Community

1
1

answered Jun 30 '16 at 19:25

chrisaycock

36,470
14
88
125

score 1 · Accepted Answer · answered Jun 30 '16 at 19:28

1

apply to rows of df1, reindex on df2 with ffill.

df1['df2.a'] = df1.apply(lambda x: pd.Series(df2.a.reindex([x.name]).ffill().values), axis=1)

answered Jun 30 '16 at 19:28

piRSquared

285,575
57
475
624

Pandas: add column with the most recent values

2 Answers2

Linked