3

I have a pandas DataFrame like this; it shows a history of stock investing. In the Profit column, 1 means profitable and 0 means loss-making.

Stock  Year   Profit  Count
 AAPL  2012    0       23
 AAPL  2012    1       19
 AAPL  2013    0       20
 AAPL  2013    1       10
GOOG   2012    0       26
GOOG   2012    1       20
GOOG   2013    0       23
GOOG   2013    1       11

I have to find out the percentage of profitable investments:

Stock  Year   Profit  CountPercent
 AAPL  2012    1       38.77
 AAPL  2013    1       33.33
GOOG   2012    1       43.47
GOOG   2013    1       32.35

I tried using the method in this post but it shows 'TypeError: Join on level between two MultiIndex objects is ambiguous'.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
ProgR
  • 155
  • 1
  • 11

3 Answers3

2

I have loaded your data into a dataframe called "stocks".

# Get the count of profitable trades, indexed by stock+year:
count_profitable = stocks[ stocks['Profit']==1 ].set_index(['Stock','Year']).Count
# Get the count of all trades, indexed by stock + year:
count_all        = stocks.groupby(['Stock','Year']).Count.sum()
# Render nice percentages
pandas.options.display.float_format = '{:.2f}%'.format 
(count_profitable/count_all) * 100

This will produce:

Stock  Year
AAPL   2012   45.24%
       2013   33.33%
GOOG   2012   43.48%
       2013   32.35%
Name: Count, dtype: float64
Tom Rees
  • 681
  • 4
  • 17
2

You could use pivot_table:

In [38]: result = df.pivot_table(index=['Stock', 'Year'], columns='Profit', values='Count', aggfunc='sum')

In [39]: result['CountPercent'] = result[1]/(result[0]+result[1])

In [41]: result['CountPercent']
Out[41]: 
Stock  Year
AAPL   2012    0.452381
       2013    0.333333
GOOG   2012    0.434783
       2013    0.323529
Name: CountPercent, dtype: float64
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
1

Assuming your DataFrame is in a consistent format (namely that 0 precedes 1 in the 'Profit' column), you could do the following groupby operations:

>>> grouped = df.groupby(['Stock', 'Year'])
>>> perc = grouped['Count'].last() / grouped['Count'].sum()
>>> perc.reset_index()
  Stock  Year     Count
0  AAPL  2012  0.452381
1  AAPL  2013  0.333333
2  GOOG  2012  0.434783
3  GOOG  2013  0.323529

This is just a normal DataFrame, so it should be straightforward to rename the 'Count' column, round it to two decimal places and add the 'Profit' column back in.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238