0

I have two df's in python that I have created from Excel spreadsheets. I have a set of particular dates in nflx_analyst_signals df and I want to get only those rows from nflx_returns df that are in the nflx_analyst_signals df. I'm new to pandas and although this seems like a visibly easy problem, I can't seem to get it done.

These are the following I have tried: 1. Reset the index (reset_index())in both df's and tried to compare dates. 2. Changed the date column to 'datetime64[ns] format and tried to compare the dates. 3. Made a list of dates in the nflx_Analysts_signals df and applies np.isin the nflx_returns df

1.

nflx_returns.reset_index()
nflx_analysts_signals.reset_index()
nflx_returns[nflx_returns['date'] == nflx_analysts_signals['dte']]

2

nflx_returns['date'] = nflx_returns['date'].astype('datetime64')
nflx_analysts_signals['dte'] = nflx_analysts_signals['dte'].astype('datetime64')
nflx_returns['CONDN_CHECK'] = nflx_returns[np.where(nflx_returns['date'].equals[nflx_analysts_signals('dte')])]

3.

dates = []
for date in nflx_analysts_signals['dte']:
    dates.append(date)
nflx_returns['CONDN'] = np.isin(list(nflx_returns['date']), dates)

The errors I got when I tried these different approaches separately in iPython: 1.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-819bcaeb6291> in <module>
----> 1 nflx_returns[nflx_returns['date'] == nflx_analysts_signals['dte']]

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(self, other, axis)
   1674 
   1675         elif isinstance(other, ABCSeries) and not self._indexed_same(other):
-> 1676             raise ValueError("Can only compare identically-labeled "
   1677                              "Series objects")
   1678 

ValueError: Can only compare identically-labeled Series objects

2.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-c94a05234af0> in <module>
----> 1 nflx_returns['CONDN_CHECK'] = nflx_returns[np.where(nflx_returns['date'].equals[nflx_analysts_signals('dte')])]

TypeError: 'DataFrame' object is not callable

3. The code worked but the resulting df had no rows in it. It was empty Basically the 'CONDN_CHECK' column had all rows as False

EDIT: Some rows of both the df's:

  1. nflx_returns:
        date    VOLUME  DAILY_TOTAL_GROSS_DIVIDEND  DAILY_GROSS_DIVIDEND    LOG_DAILY_RETURN    CUM_TOT_RETURN_5D   CUM_TOT_RETURN_20D  CUM_TOT_RETURN_60D  CUM_TOT_RETURN_120D AVG_VOLUME_5D   AVG_VOLUME_20D  AVG_VOLUME_60D  AVG_VOLUME_120D
2792    2013-06-26  18839520    -0.3758 -0.003758   -0.003765   -0.086996   -0.015046   0.200408    1.195885    20438721.8  21219322.6  2.852531e+07    3.323521e+07
2793    2013-06-27  18332594    1.3531  0.013531    0.013440    -0.038253   -0.034537   0.266465    1.239741    19879419.0  20930923.3  2.757415e+07    3.323990e+07
2794    2013-06-28  19581436    -1.8049 -0.018049   -0.018214   -0.026787   -0.067006   0.266362    1.127918    19116070.4  20638025.8  2.732609e+07    3.302334e+07
2795    2013-07-01  24518284    6.2485  0.062485    0.060611    0.040259    0.010406    0.362078    1.308446    20438378.8  20800170.3  2.719278e+07    3.302161e+07
2796    2013-07-02  17730678    -1.2574 -0.012574   -0.012654   0.040205    -0.017089   0.358149    1.309040    19800502.4  20535986.1  2.695898e+07    3.300082e+07

  1. nflx_analyst_signals:
    ticker  dte
8   NFLX US 2013-06-28
20  NFLX US 2013-07-31
33  NFLX US 2013-08-30
271 NFLX US 2015-03-31
287 NFLX US 2015-04-30
  • @cs95 I had tried the approach given in the question you mentioned and pandas returned every row as false. Basically, it did not detect the dates that were the same – Prateek Singh Maini Jun 14 '19 at 16:24
  • Can you please provide 5-10 rows of the data where I can reproduce this problem? Please [edit] your question with text we can copy-paste into our terminal. Hint: run `dfx.head()` to get the rows out. – cs95 Jun 14 '19 at 16:25
  • Edited! There is one row that is common amongst both the df that should be retained in nflx_returns df! @cs95 – Prateek Singh Maini Jun 14 '19 at 17:02

0 Answers0