1

I have the following df,

inv_date        inv_id
2017-10-01      100117
2018-04-02      040218
2018-05-06      060518

where inv_date is of datetime dtype, and inv_id is str; I want to convert inv_id into datetime, based on the following formats,

formats = {'%m%d%y': 6, '%d%m%y': 6}
L = [pd.to_datetime(s.str[:v], format=k, errors='coerce') for k, v in formats.items()]
df1 = pd.concat(L, axis=1, keys=[s.name + '_' + str(i) for i, s in zip(count(), L)])
df1 = df.apply(lambda x: x.where(x.between('2000-01-01', datetime.now())))

I want to create a boolean column dummy_inv_id, which is set to True if any of non-NaT converted datetime is within +/- 180 days of inv_date,

df1 = df1.assign(inv_date=df['inv_date'])
df1['inv_id_1'].between(df1['inv_date'] - Timedelta(180, unit='d'), df1['inv_date'] + Timedelta(180, unit='d'))
df1['inv_id_2'].between(df1['inv_date'] - Timedelta(180, unit='d'), df1['inv_date'] + Timedelta(180, unit='d'))

I am wondering how to consider all datetime columns (inv_id_1 and inv_id_2) in df1 collectively so if anyone is between inv_date +/- 180 days, then assign true to df for corresponding datetime;

so the results df look like,

inv_date        inv_id    dummy_inv_id
2017-10-01      100117    true
2018-04-02      040218    true
2018-05-06      060518    true
daiyue
  • 7,196
  • 25
  • 82
  • 149
  • I get `a = df1['inv_id_1'].between(df1['inv_date'] - pd.Timedelta(180, unit='d'), df1['inv_date'] + pd.Timedelta(180, unit='d')) b = df1['inv_id_2'].between(df1['inv_date'] - pd.Timedelta(180, unit='d'), df1['inv_date'] + pd.Timedelta(180, unit='d'))` – jezrael Sep 05 '18 at 10:58
  • output is `print (a) 0 True 1 True 2 True dtype: bool print (b) 0 False 1 True 2 True dtype: bool` – jezrael Sep 05 '18 at 10:58
  • So need `c = a | b` ? – jezrael Sep 05 '18 at 10:59
  • @jezrael is it possible to put it all in one line considering how many formats defined, since the number of formats determines the number of converted date columns in `df1` – daiyue Sep 05 '18 at 11:04

1 Answers1

1

You can use np.logical_or.reduce:

a = df1['inv_id_1'].between(df1['inv_date'] - pd.Timedelta(180, unit='d'), df1['inv_date'] + pd.Timedelta(180, unit='d'))
b = df1['inv_id_2'].between(df1['inv_date'] - pd.Timedelta(180, unit='d'), df1['inv_date'] + pd.Timedelta(180, unit='d'))

c = [a,b]
df['dummy_inv_id'] = np.logical_or.reduce(c)
print (df)
    inv_date  inv_id  dummy_inv_id
0 2017-10-01  100117          True
1 2018-04-02   40218          True
2 2018-05-06   60518          True
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252