1

Seems like there are many DF questions with conditions from another DF but I can't find any that do what I need. Both dataframes are small samples. They are each thousands of columns wide. I have a DataFrame (df1) that looks like this

             IBM    BA      CAT     IBM EARN    BA EARN   CAT EARN
Date
1/22/2018   163.13  65.94   76.50     NaN        NaN       NaN
1/23/2018   163.17  65.94   76.51     NaN        NaN       NaN
1/24/2018   167.26  67.43   79.23     NaN        NaN       NaN
1/25/2018   166.28  67.77   80.57     NaN        NaN       NaN
1/26/2018   166.58  68.37   80.87     NaN        NaN       NaN
1/27/2018   166.77  68.87   81.07     NaN        NaN       NaN
1/28/2018   167.98  68.57   81.07     NaN        NaN       NaN
2/1/2018    167.98  68.77   81.59     NaN        NaN       NaN
2/2/2018    167.98  69.07   81.87     NaN        NaN       NaN

I have another dataframe (df2) with identical columns as the last three in df1 but with specific dates

    IBM EARN    BA EARN     CAT EARN
0   1/22/2018   2/1/2018    1/26/2018
1   10/19/2017  10/26/2017  10/25/2017
2   7/20/2017   7/27/2017   7/26/2017
3   4/20/2017   4/27/2017   4/26/2017
4   1/23/2017   1/26/2017   1/27/2017
5   10/19/2016  10/27/2016  10/26/2016
6   7/20/2016   7/28/2016   7/27/2016

I want to place a 1 in df1 where there is a corresponding date in df2. So the (partial) result would look like this but would continue for all dates list in df2.

             IBM     BA     CAT     IBM EARN    BA EARN   CAT EARN
Date
1/22/2018   163.13  65.94   76.50    **1**       NaN       NaN
1/23/2018   163.17  65.94   76.51     NaN        NaN       NaN
1/24/2018   167.26  67.43   79.23     NaN        NaN       NaN
1/25/2018   166.28  67.77   80.57     NaN        NaN       NaN
1/26/2018   166.58  68.37   80.87     NaN        NaN        **1**
1/27/2018   166.77  68.87   81.07     NaN        NaN       NaN
1/28/2018   167.98  68.57   81.07     NaN        NaN       NaN
2/1/2018    167.98  68.77   81.59     NaN        **1**     NaN
2/2/2018    167.98  69.07   81.87     NaN        NaN       NaN

Please let me know if you can help with a solution

J Westwood
  • 421
  • 1
  • 9
  • 22
  • please post desired output not partial output – Pyd Apr 12 '18 at 05:10
  • I dont have the entire output unfortunately because it is more than 1000 lines but df1 has daily dates (as the index) and stock prices and I want to put a 1 in each instance from df2 where there is a date. So for IBM it would be 1/22/2018 (as shown) and then 10/19/17 then 7/20/17,etc. Same for the other 2 columns with the dates – J Westwood Apr 12 '18 at 05:15
  • @JWestwood - I add also non loop solution, if you are interested. – jezrael Apr 12 '18 at 05:51

2 Answers2

2

For each column of second DaatFrame check membership by Index.isin and with numpy.where replace values:

for col in df2.columns:
    df1[col] = np.where(df1.index.isin(df2[col]),1,np.nan)
print (df1)
              IBM     BA    CAT  IBM EARN  BA EARN  CAT EARN
Date                                                        
1/22/2018  163.13  65.94  76.50       1.0      NaN       NaN
1/23/2018  163.17  65.94  76.51       NaN      NaN       NaN
1/24/2018  167.26  67.43  79.23       NaN      NaN       NaN
1/25/2018  166.28  67.77  80.57       NaN      NaN       NaN
1/26/2018  166.58  68.37  80.87       NaN      NaN       1.0
1/27/2018  166.77  68.87  81.07       NaN      NaN       NaN
1/28/2018  167.98  68.57  81.07       NaN      NaN       NaN
2/1/2018   167.98  68.77  81.59       NaN      1.0       NaN
2/2/2018   167.98  69.07  81.87       NaN      NaN       NaN

EDIT:

Non loop solution with DataFrame.isin by dictionary of list created by df2 with cast boolean mask to integers:

#first create DataFrame by repeat index of df1
#https://stackoverflow.com/a/45118399
arr = np.broadcast_to(df1.index[:, None], (len(df1), len(df2.columns)))
df3 = pd.DataFrame(arr, columns=df2.columns, index=df1.index)

df3 = df3.isin(df2.to_dict('l')).astype(int)
print (df3)
           IBM EARN  BA EARN  CAT EARN
Date                                  
1/22/2018         1        0         0
1/23/2018         0        0         0
1/24/2018         0        0         0
1/25/2018         0        0         0
1/26/2018         0        0         1
1/27/2018         0        0         0
1/28/2018         0        0         0
2/1/2018          0        1         0
2/2/2018          0        0         0

df1 = df1.drop(df2.columns, 1).join(df3)
print (df1)
              IBM     BA    CAT  IBM EARN  BA EARN  CAT EARN
Date                                                        
1/22/2018  163.13  65.94  76.50         1        0         0
1/23/2018  163.17  65.94  76.51         0        0         0
1/24/2018  167.26  67.43  79.23         0        0         0
1/25/2018  166.28  67.77  80.57         0        0         0
1/26/2018  166.58  68.37  80.87         0        0         1
1/27/2018  166.77  68.87  81.07         0        0         0
1/28/2018  167.98  68.57  81.07         0        0         0
2/1/2018   167.98  68.77  81.59         0        1         0
2/2/2018   167.98  69.07  81.87         0        0         0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

you can try this as date is your index:

In [18]: df1['IBMEARN'] = np.where(df1.index.isin(df2.IBMEARN),1,0)

In [19]: df1['BAEARN'] = np.where(df1.index.isin(df2.BAEARN),1,0)

In [21]: df1['CATEARN'] = np.where(df1.index.isin(df2.CATEARN),1,0)
In [22]: df1
Out[22]: 
              IBM     BA    CAT  IBMEARN  BAEARN  CATEARN
DATE                                                     
1/22/2018  163.13  65.94  76.50        1       0        0
1/23/2018  163.17  65.94  76.51        0       0        0
1/24/2018  167.26  67.43  79.23        0       0        0
1/25/2018  166.28  67.77  80.57        0       0        0
1/26/2018  166.58  68.37  80.87        0       0        1
1/27/2018  166.77  68.87  81.07        0       0        0
1/28/2018  167.98  68.57  81.07        0       0        0
2/1/2018   167.98  68.77  81.59        0       1        0
2/2/2018   167.98  69.07  81.87        0       0        0
shivsn
  • 7,680
  • 1
  • 26
  • 33
  • Yes that does work so it is correct but I have more than 3000 columns (stocks) so I guess I could loop through them and do it but was wondering if there was a more pythonic way to do all those columns rather than a loop. let me know if you can think of anything. ty – J Westwood Apr 12 '18 at 05:24
  • I see loop posted from @jezreal. That seems simple enough. I think both answer are correct. I will mark as such if I can credit 2 answers – J Westwood Apr 12 '18 at 05:28
  • @JWestwood I think jezreal answer is better you should accept that. – shivsn Apr 12 '18 at 05:55