2

I'm working in Python. I have two dataframes df1 and df2:

d1 = {'timestamp1': [88148  , 5617900, 5622548, 5645748, 6603950, 6666502], 'col01': [1, 2, 3, 4, 5, 6]}
df1 = pd.DataFrame(d1)

d2 = {'timestamp2': [5629500, 5643050, 6578800, 6583150, 6611350], 'col02': [7, 8, 9, 10, 11], 'col03': [0, 1, 0, 0, 1]}
df2 = pd.DataFrame(d2)

I want to create a new column in df1 with the value of the minimum timestamp of df2 greater than the current df1 timestamp, where df2['col03'] is zero. This is the way I did it:

df1['colnew'] = np.nan
TSs = df1['timestamp1']
for TS in TSs:
values = df2['timestamp2'][(df2['timestamp2'] > TS) & (df2['col03']==0)]
    if not values.empty:
        df1.loc[df1['timestamp1'] == TS, 'colnew'] = values.iloc[0]

It works, but I'd prefer not to use a for loop. Is there a better way to do this?

GiuVi
  • 23
  • 4
  • Have you tried searchsorted (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.searchsorted.html)? – David L Apr 04 '18 at 07:38

2 Answers2

0

Give a try to the apply method.

def func(x):
    values = df2['timestamp2'][(df2['timestamp2'] > x) & (df2['col03']==0)]
    if not values.empty:
        return values.iloc[0]
    else:
        np.NAN

df1["timestamp1"].apply(func)

You can create a separate function to do what has to be done. The output is your new column

0    5629500.0
1    5629500.0
2    5629500.0
3    6578800.0
4          NaN
5          NaN
Name: timestamp1, dtype: float64

It is not an one-line solution, but it helps keeping things organised.

Wilson Freitas
  • 531
  • 5
  • 10
  • have a read of https://stackoverflow.com/questions/34962104/pandas-how-can-i-use-the-apply-function-for-a-single-column – Joe Iddon Apr 04 '18 at 07:59
0

Use pandas.merge_asof with a forward direction

pd.merge_asof(
    df1, df2.loc[df2.col03 == 0, ['timestamp2']],
    left_on='timestamp1', right_on='timestamp2', direction='forward'
).rename(columns=dict(timestamp2='colnew'))

   col01  timestamp1     colnew
0      1       88148  5629500.0
1      2     5617900  5629500.0
2      3     5622548  5629500.0
3      4     5645748  6578800.0
4      5     6603950        NaN
5      6     6666502        NaN
piRSquared
  • 285,575
  • 57
  • 475
  • 624