0

I am working with a data frame which has a date column and I have to find the last working date of every month, the code I used works but it doesn't make sense how it worked

The data frame 'apple' had 6 columns initially but I am mainly concerned with the 'Date' column which has dates ranging from the year 2014-1980 of every month Sample Data:

    Date    Open    High    Low Close   Volume  Adj Close
0   2014-07-08  96.27   96.80   93.92   95.35   65130000    95.35
1   2014-07-07  94.14   95.99   94.10   95.97   56305400    95.97
2   2014-07-03  93.67   94.10   93.20   94.03   22891800    94.03
3   2014-07-02  93.87   94.06   93.09   93.48   28420900    93.48
4   2014-07-01  93.52   94.07   93.13   93.52   38170200    93.52
from pandas.tseries.offsets import MonthEnd
apple['Last_Day']=pd.to_datetime(apple['Date'],format="%Y-%m")+MonthEnd(0)
banana=apple.loc[-apple.Last_Day.duplicated()]

I expected the newly created 'Last_Day' column will have the last day of every month which I got but surprisingly the 'Date' column got the last working day of every month which I didn't understand cause I did not initialize anything to 'Date' so how did all the values in 'Date' get replaced by last working day, Output:

    Date        Open    High    Low     Close   Volume    Adj Close  Last_Day
0   2014-07-08  96.27   96.80   93.92   95.35   65130000    95.35   2014-07-31
5   2014-06-30  92.10   93.73   92.09   92.93   49482300    92.93   2014-06-30
26  2014-05-30  637.98  644.17  628.90  633.00  141005200   90.43   2014-05-31
47  2014-04-30  592.64  599.43  589.80  590.09  114160200   83.83   2014-04-30
68  2014-03-31  539.23  540.81  535.93  536.74  42167300    76.25   2014-03-31
89  2014-02-28  529.08  532.75  522.12  526.24  92992200    74.76   2014-02-28
108 2014-01-31  495.18  501.53  493.55  500.60  116199300   70.69   2014-01-31
  • Can you add some sample data with expected output, [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) ? – jezrael Jun 26 '19 at 06:22
  • @jezrael added. – Prejith Alex Jun 26 '19 at 06:27
  • Line 0 was not replaces – Shlomi Bazel Jun 26 '19 at 06:33
  • Have you been using groupby? Can you provide full code? – Arkady. A Jun 26 '19 at 06:34
  • @Arkady.A I did not use groupby. Initially the Date column was 'object' type, I changed that to datetime, otherwise nothing else done on original data. – Prejith Alex Jun 26 '19 at 06:38
  • @PrejithAlex Add more code and information for more clarity – Rahul charan Jun 26 '19 at 06:46
  • @ShlomiBazel any idea why is it happening? – Prejith Alex Jun 26 '19 at 06:47
  • @Rahulcharan this is the whole code , ```apple``` dataframe was given and in that the only thing I did this before this was to convert ```Date``` from object type to datetime type. I cant understand why is the ```Date``` column getting replaced – Prejith Alex Jun 26 '19 at 06:49
  • @PrejithAlex I noticed some functions that are used return data which you must initialize in a new variable, and some other functions are making work on the current dataframe you provide. Try initializing a new variable `temp = df['Date']` and then perform the changes on it `pd.to_datetime(temp`. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html – Shlomi Bazel Jun 26 '19 at 06:53

1 Answers1

1

No, my doubt is why is the Date column getting replaced by last working date, I do want the last working day but I did not understand how was the Date column replaced by last working day

There is no replacement, but last value of Date per month and year depends of data in Date column after removed duplicates.

So here is last value same like Last_Day, with exception July 2014 - there is last day of month 2014-07-08.

For better understanding changed data and sorted - then get first value per month or last vlue oper month:

print (apple)
         Date   Open   High    Low  Close    Volume  Adj Close
0  2014-07-08  96.27  96.80  93.92  95.35  65130000      95.35
1  2014-06-07  94.14  95.99  94.10  95.97  56305400      95.97
2  2014-06-03  93.67  94.10  93.20  94.03  22891800      94.03
3  2014-05-31  93.87  94.06  93.09  93.48  28420900      93.48
4  2014-07-31  93.52  94.07  93.13  93.52  38170200      93.52

from pandas.tseries.offsets import MonthEnd

apple['Date']=pd.to_datetime(apple['Date'])
apple = apple.sort_values('Date')
print (apple)
        Date   Open   High    Low  Close    Volume  Adj Close
3 2014-05-31  93.87  94.06  93.09  93.48  28420900      93.48
2 2014-06-03  93.67  94.10  93.20  94.03  22891800      94.03
1 2014-06-07  94.14  95.99  94.10  95.97  56305400      95.97
0 2014-07-08  96.27  96.80  93.92  95.35  65130000      95.35
4 2014-07-31  93.52  94.07  93.13  93.52  38170200      93.52

apple['Last_Day']=apple['Date']+MonthEnd(0)
banana=apple.loc[-apple.Last_Day.duplicated()]
print (banana)
        Date   Open   High    Low  Close    Volume  Adj Close   Last_Day
3 2014-05-31  93.87  94.06  93.09  93.48  28420900      93.48 2014-05-31
2 2014-06-03  93.67  94.10  93.20  94.03  22891800      94.03 2014-06-30
0 2014-07-08  96.27  96.80  93.92  95.35  65130000      95.35 2014-07-31

from pandas.tseries.offsets import MonthEnd


apple['Date']=pd.to_datetime(apple['Date'])
apple1 = apple.sort_values('Date', ascending=False)
print (apple1)
        Date   Open   High    Low  Close    Volume  Adj Close
4 2014-07-31  93.52  94.07  93.13  93.52  38170200      93.52
0 2014-07-08  96.27  96.80  93.92  95.35  65130000      95.35
1 2014-06-07  94.14  95.99  94.10  95.97  56305400      95.97
2 2014-06-03  93.67  94.10  93.20  94.03  22891800      94.03
3 2014-05-31  93.87  94.06  93.09  93.48  28420900      93.48

apple1['Last_Day']=apple1['Date']+MonthEnd(0)
banana1=apple1.loc[-apple1.Last_Day.duplicated()]
print (banana1)
        Date   Open   High    Low  Close    Volume  Adj Close   Last_Day
4 2014-07-31  93.52  94.07  93.13  93.52  38170200      93.52 2014-07-31
1 2014-06-07  94.14  95.99  94.10  95.97  56305400      95.97 2014-06-30
3 2014-05-31  93.87  94.06  93.09  93.48  28420900      93.48 2014-05-31
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I want the last working day which i was going to do in the subsequent step but my doubt is how did I get the Last working Day in the Date column when I didn't initialise Date, how did the original value in 'Date' get replaced by Last working day – Prejith Alex Jun 26 '19 at 06:36
  • @PrejithAlex - not understand, do you need replace `Date` column by last day of month? – jezrael Jun 26 '19 at 06:37
  • No, my doubt is why is the ```Date``` column getting replaced by last working date, I do want the last working day but I did not understand how was the Date column replaced by last working day – Prejith Alex Jun 26 '19 at 06:40
  • @jezrael why are you giving `0 ` inside `Monthend`, could not find docstring/documentation – Pyd Jun 26 '19 at 06:54
  • 1
    @pyd - replaced is `Last_Day` - it is last column, not `Date` - first column – jezrael Jun 26 '19 at 06:55
  • 1
    @pyd check the accepted and the last answer [here](https://stackoverflow.com/questions/37354105/find-the-end-of-the-month-pandas-dataframe-series) :) – anky Jun 26 '19 at 06:55
  • ok @jezrael i want to know how the monthend works for `n=0` and `n=1`, pls send documentation link – Pyd Jun 26 '19 at 06:57
  • 1
    @pyd - hmmm, I have to test it, here seems same (but maybe there are some differences, not sure) – jezrael Jun 26 '19 at 06:58
  • yes, same here, n=0 and n=1 will give same result for some dates, I understood n=0 will give corresponding month's last working day, not sure about n=1 :( – Pyd Jun 26 '19 at 07:00
  • @PrejithAlex - I try edit answer with sample data for better explanation, please check it. – jezrael Jun 26 '19 at 07:33