1

If I have a dataframe with a value for a date "interval" and then another dataframe of consecutive dates, how can I set a value in the second dataframe given the date interval in the first dataframe.

# first dataframe (the "lookup", if you will)
df1 = pd.DataFrame(np.random.random((10, 1)))
df1['date'] = pd.date_range('2017-1-1', periods=10, freq='10D')

# second dataframe
df2 = pd.DataFrame(np.arange(0,100))
df2['date'] = pd.date_range('2016-12-29', periods=100, freq='D')

So if df2 date is greater than or equal to a df1 date and less than a contiguous date in df1 we would say something like: df2['multiplier'] = df1[0], for the proper element that fits within the dates.

Also not sure how the upper boundary would be handled, i.e. if df2 date is greater than the greatest date in df1, it would get the last value in df1.

jeff_new
  • 389
  • 2
  • 4
  • 12

1 Answers1

0

This feels dirty, so with apologies to the arts of element-wise operations, here's my go at it.

# create an "end date" second column by shifting the date
df1['end_date'] = df1['date'].shift(-1) + pd.DateOffset(-1)

# create a simple list by nested iteration
multiplier = []
for elem, row in df2.iterrows():
    if row['date'] < min(df1['date']):
        # kinda don't care about this instance
        multiplier.append(0)
    elif row['date'] < max(df1['date']):
        tmp_mult = df1[(df1['date'] <= row['date']) & (row['date'] <= df1['end_date'])][0].values[0]
        multiplier.append(tmp_mult)
        # for l_elem, l_row in df1.iterrows():
            # if l_row.date <= row['date'] <= l_row.end_date:
                # multiplier.append(l_row[0])
    else:
        multiplier.append(df1.loc[df1.index.max(), 0])

# set the list as a new column in the dataframe
df2['multiplier'] = multiplier
jeff_new
  • 389
  • 2
  • 4
  • 12
  • Looks like this has been solved at the following post with nice use of np.piecewise: http://stackoverflow.com/questions/31328014/merging-dataframes-based-on-date-range However, how would we add the case that any df2 date greater than the df1 max date, gets the df1 max date's ID? – jeff_new Feb 21 '17 at 16:27