2

I have data which I sorted by their days in excel, what I want to do now is get the sum of the daily returns for each day. The problem here is that I have multiple entries for the days. So I might only have one Daily Return entry for 2018-12-05 but 5 entries for 2018-12-06. I would like that I only get one entry for 2018-12-06 with both the cumulative daily return (so all cumulative returns added together) and the average daily return (so the cumulative return divided by the amount of entries for the day. For 2018-12-06 this would be divided by 5).

So the data I have right now looks like this:

            Dates  Last.Price  Daily.Return
19788  2018-11-23       75.18     -0.001199
19789  2018-11-23      129.04     -0.026490
19790  2018-11-26       77.84     -0.035382
19791  2018-11-26      127.98      0.008215
19792  2018-11-27       79.50     -0.021326
19793  2018-11-27      122.68      0.041413
19794  2018-11-28       80.27     -0.009686
19795  2018-11-29       80.00      0.003364

The final Data Frame should look like this

              Dates  Last.Price  Cum.Return   Average.Return
19788  2018-11-23       75.18     -0.027689    -0.0138445
19790  2018-11-26       77.84     -0.027167    -0.0135835
19792  2018-11-27       79.50      0.020087     0.0100435
19794  2018-11-28       80.27     -0.009686    -0.009686
19795  2018-11-29       80.00      0.003364     0.003364

I have the following code so far to sum the daily returns. However it doesn't sum correctly. And I don't know how to implement the average daily return.

df = pd.read_csv('/Python Test/SP500Acquirer.csv')

def sum_from_days_prior(row, df):
    '''returns sum of values in row month, 
    from all dates in df prior to row date'''

    day = pd.to_datetime(row).day

    all_dates_prior = df[df.index <= row]
    same_day = all_dates_prior[all_dates_prior.index.day == day]

    return same_day["Daily.Return"].sum()


df.set_index('Dates', inplace = True)
df.index = pd.to_datetime(df.index)
df["Dates"] = df.index
df.sort_index(inplace = True)

df["Day"] = df["Dates"].apply(lambda row: sum_from_days_prior (row, df))
df.drop("Dates", axis = 1, inplace = True)

print(df.tail(20))

As said before this code does not sum the daily returns correctly. And I do not know how to get the average returns for the days.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
Elias K.
  • 513
  • 1
  • 4
  • 12
  • Have you tried to used pandas `groupby` method @EliasK.? You can even add your function to it, and sum the values on grouping by date: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html – Luan Naufal Dec 08 '18 at 12:38
  • 1
    Possible duplicate of [Pandas group-by and sum](https://stackoverflow.com/questions/39922986/pandas-group-by-and-sum) – Luan Naufal Dec 08 '18 at 12:39

1 Answers1

1

I think you need aggregate by agg with functions first, sum and mean:

Because column Daily.Return is aggregates by multiple functions defined in list, get MultiIndex in output. So is necessary flatten it - simpliest is use map with join.

df = df.groupby('Dates').agg({'Last.Price':'first', 'Daily.Return':['mean','sum']})

print (df)
           Last.Price Daily.Return          
                first         mean       sum
Dates                                       
2018-11-23      75.18    -0.013844 -0.027689
2018-11-26      77.84    -0.013583 -0.027167
2018-11-27      79.50     0.010044  0.020087
2018-11-28      80.27    -0.009686 -0.009686
2018-11-29      80.00     0.003364  0.003364

print (df.columns)
MultiIndex(levels=[['Last.Price', 'Daily.Return'], ['first', 'mean', 'sum']],
           labels=[[0, 1, 1], [0, 1, 2]])

df.columns = df.columns.map('_'.join)
print (df)
           Last.Price_first  Daily.Return_mean  Daily.Return_sum
Dates                                                            
2018-11-23             75.18          -0.013844         -0.027689
2018-11-26             77.84          -0.013583         -0.027167
2018-11-27             79.50           0.010044          0.020087
2018-11-28             80.27          -0.009686         -0.009686
2018-11-29             80.00           0.003364          0.003364

Last renamecolumns:

d = {'Last.Price_first':'Last.Price',
     'Daily.Return_sum': 'Cum.Return',
     'Daily.Return_mean': 'Average.Return'}

df = df.rename(columns=d)
print (df)
            Last.Price  Average.Return  Cum.Return
Dates                                             
2018-11-23       75.18       -0.013844   -0.027689
2018-11-26       77.84       -0.013583   -0.027167
2018-11-27       79.50        0.010044    0.020087
2018-11-28       80.27       -0.009686   -0.009686
2018-11-29       80.00        0.003364    0.003364
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252