0

I have a large DF which is structured like this. It has multiple stocks in level 0 and Date is level 1. Starts monthly data at 12/31/2004 and continues to 12/31/2017 (not shown).

      Date      DAILY_RETURN
A   12/31/2004      NaN
    1/31/2005      -8.26
    2/28/2005       8.55
    3/31/2005      -7.5
    4/29/2005      -6.53
    5/31/2005       15.71
    6/30/2005      -4.12
    7/29/2005       13.99
    8/31/2005       22.56
    9/30/2005       1.83
    10/31/2005     -2.26
    11/30/2005      11.4
    12/30/2005     -6.65
    1/31/2006       1.86
    2/28/2006       6.16
    3/31/2006       4.31

What I want to do is groupby the month and then count the number of POSITIVE returns in the daily_returns by month (ie 01, then 02, 03, etc from the Date part of the index). This code will give me the count but only by index level=0.
df3.groupby(level=0)['DAILY_RETURN'].agg(['count'])

There are other question out there, this one being the closest but I can not get the code to work. Can someone help out. Ultimately what I want to do is groupby stock and then month and FILTER all stocks that have at least 70% positive returns by month. I cant seem to figure out how to get the positive return from the dataframe either

How to group pandas DataFrame entries by date in a non-unique column

Community
  • 1
  • 1
J Westwood
  • 421
  • 1
  • 9
  • 22
  • please avoid posting images of data, copy paste the data directly in the question and format as code – Vivek Kalyanarangan Dec 28 '18 at 23:09
  • I click code and get 'enter code here' but when I try and paste in text from excel it always gives a picture. It only says save or cancel. Is there a way to enter the table in the code icon without having to type it all in – J Westwood Dec 28 '18 at 23:26
  • maybe try pasting into notepad first, then copy from notepad – CandleWax Dec 29 '18 at 03:30

1 Answers1

0

Here it is for a smaller data, using datetime

import pandas as pd
from datetime import datetime

df = pd.DataFrame()
df['Date'] = ['12/31/2004', '1/31/2005', '12/31/2005', '2/28/2006', '2/28/2007']
df['DAILY_RETURN'] = [-8, 9, 5, 10, 14]
df = df[df.DAILY_RETURN > 0]
df['Date_obj'] = df['Date'].apply(lambda x: datetime.strptime(x, '%m/%d/%Y').month)
df.groupby('Date_obj').count()[['DAILY_RETURN']]

Counts

  • @path.s this works but I can not get it to work with multi-index. The issue I am having is I cant figure out how to month from level=1 of index. So I tried this df3['Date_obj'] = df3.groupby(level=1).apply(lambda x: datetime.strptime(x, '%m/%d/%Y').month) and it gave the follow error. TypeError: strptime() argument 1 must be str, not DataFrame – J Westwood Dec 30 '18 at 15:48
  • When you do apply() on a groupby() object, your argument to apply is a slice of the dataframe (effectively another dataframe) and not a value from a cell (a string in this example). I would suggest doing `df.reset_index(level='Date', inplace=True); df['Date_obj'] = df['Date'].apply(lambda x: datetime.strptime(x, '%m/%d/%Y').month); df.groupby('Date_obj').count()[['DAILY_RETURN']]` – pathankhan.salman Dec 30 '18 at 21:15