0

I have a dataset that looks like:

print(portfolio_all[1])
            Date       Open       High  ...      Close  Adj Close    Volume
0     2010-01-04   4.840000   4.940000  ...   4.770000   4.513494   9837300
1     2010-01-05   4.790000   5.370000  ...   5.310000   5.024457  25212000
2     2010-01-06   5.190000   5.380000  ...   5.090000   4.816288  16597900
3     2010-01-07   5.060000   5.430000  ...   5.240000   4.958220  14033400
4     2010-01-08   5.270000   5.430000  ...   5.140000   4.863598  12760000
5     2010-01-11   5.130000   5.230000  ...   5.040000   4.768975  10952900
6     2010-01-12   5.060000   5.150000  ...   5.080000   4.806825   7870300
7     2010-01-13   5.120000   5.500000  ...   5.480000   5.185314  16400500
8     2010-01-14   5.460000   5.710000  ...   5.590000   5.289400  12767100
9     2010-01-15   5.640000   5.840000  ...   5.500000   5.204239  10985300
10    2010-01-19   5.500000   5.730000  ...   5.640000   5.336711   7807700
11    2010-01-20   5.650000   5.890000  ...   5.740000   5.431333  13289100

I want to calculate how many days had positive return (i.e. Close_day_t > Close_day_t-1)

I tried the following function:

def positive_return_days(portfolio):
    positive_returns = pd.DataFrame(
    columns=['ticker', 'name', 'total positive', 'total days'])
    for asset in portfolio:
        for index, row in asset.iterrows():
            try:
                this_day_close = asset.iloc[[index]]['Close']
                previous_day_close = asset.iloc[[index-1]]['Close']
                asset.loc[index, 'positive_days'] = np.where((this_day_close > previous_day_close))
            except IndexError:
             print("I get out of bounds")
    total_positive_days = asset['positive_days'].sum()
    new_row = {'ticker':asset.name, 'name':asset.name, 'total positive':total_positive_days, 'total days':len(asset.index)}
    positive_returns = positive_returns.append(new_row, ignore_index=True)
    print("Asset: ", "total positive days: ", total_positive_days, "total days:",len(asset.index))
    return positive_returns

but I am getting an error:

ValueError: Can only compare identically-labeled Series objects

How can I fix it?

mcsoini
  • 6,280
  • 2
  • 15
  • 38
adrCoder
  • 3,145
  • 4
  • 31
  • 56
  • Can you post expected output? – Poojan Nov 20 '19 at 16:11
  • expected output would be the total number of positive return days (and the percentage of positive return days) per stock... – adrCoder Nov 20 '19 at 16:12
  • Using `iterrows` is really slow. IIUC, you're looking for the following `print(((df["Close"] - df["Close"].shift(1))>0).sum())` – pault Nov 20 '19 at 16:13
  • Possible duplicate of [Computing diffs within groups of a dataframe](https://stackoverflow.com/questions/20648346/computing-diffs-within-groups-of-a-dataframe) – pault Nov 20 '19 at 16:15

2 Answers2

2
  • you can just use .shift functio to shift column by one value.
import pandas as pd

df = pd.DataFrame({'Close':[1,2,3,2,1,3]})

print(df)
print("count",(df.Close - df.Close.shift(1) > 0).sum())

*output:

   Close
0   1
1   2
2   3
3   2
4   1
5   3
count:3
Poojan
  • 3,366
  • 2
  • 17
  • 33
1

You can use pd.Series.diff to calculate the difference and then count the ones that are positive:

(df['Close'].diff() > 0).sum()
mcsoini
  • 6,280
  • 2
  • 15
  • 38