-1

After calculating the means for all the records per event type, I now must calculate the number of times an athlete has scored above the mean. I have found a way to do this 1 at a time but I would like to know if there is a more elegant way of doing it in pandas. Sorry for the images but it probably the best way I can show you guys what I have done and want to do

EDIT: My apologies, I am new to the interface. I will try to be as clear as possible with code.(Anyone know of a better way to display a dataframe thats is not an image for stackoverflow purposes?

d = {'Event':['Mens 100m','Womens 100m','Mens 800m', 'Mens 800m'],'Record':[10.06,10.6,50.4,60.5]}
df = pd.DataFrame(data = d)

1.) I needed to find the means the records set for all the different events:

for i in range(len(Events)):
     x = df[df['Event'] == Events[i]]['Record'].mean()
     print(Events[i], ":", "{0:.5f}".format(x))

#The line below gives me a list of all the means per different type of event using a list comp.

Means =  [df[df['Event']== Events[i]]['Record'].mean() for i in range(len(Events))]

2.) Using these means I must find the number of athletes for each event that have a record above the mean that was calculated in the line above.

# i = 0 where Events[0] is 'Womens 100m'
i = 0; df[df['Event'] == Events[i]][['Record']] > Means[i]

Output:2

I need to count the number of True values in the above for all events. Any nice way to do this except for assigning it as a series and then counting True? Which would look like this:

d = {'Athletes over Mean for Each Event':[4,6,10,2,5,6]}
df = pd.DataFrame(data = d)

Thank you again in advance, hope I made it clearer this time.

Artem
  • 13
  • 3
  • wouldnt df_grouped = df.groupby('Event')['Record'].mean() help you with defining the means? btw, not exactly clear on what you need in the end. a sample of your data would be helpful – Zanshin Nov 17 '18 at 11:43
  • 1
    You obviously use a notebook, so instead of put posting pictures you might copy your code and output into your question as text and format it as code with '{}'. – Zanshin Nov 17 '18 at 14:23
  • 1
    In addition, please provide a table with the expected output – Zanshin Nov 17 '18 at 14:23
  • Please see [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – jpp Nov 18 '18 at 00:17

2 Answers2

0

You can filter your dataframe (I am assuming you have one) and get the length or shape something like this:

  Number_above_mean = len(df[(df[player]==nameyouwant)&(df[score]>df[score].mean())])

or

Number_above_mean = df[(df[player]==nameyouwant)&(df[score]>df[score].mean())].shape

PS. Please add your code and sample data

Jorge
  • 2,181
  • 1
  • 19
  • 30
0

IIUC, you want the count of occurrences in the different categories, where it is over the mean of that category. Code below can in no doubt be optimized, but it might be helpful for you now.

import pandas as pd
import numpy as np

df = pd.DataFrame( {
   'A': ['d','d','d','f','f','f','g','g','g','h','h','h'],
   'B': [9,10,11,7,9,10,11,8,12,7,8,12],
    } );

df_grouped = df.groupby('A')['B'].mean()
df_grouped.to_frame()
df_grouped.reset_index()

df1 = df.merge(df_grouped.to_frame(), left_on='A', right_index=True)

df1['num'] = np.where(df1['B_x'] > df1['B_y'], 1, 0)

df2 = df1.groupby('A')['num'].apply(lambda x: (x==1).sum()).reset_index(name='count')

df2


A   count
0   d   1
1   f   2
2   g   2
3   h   1

I've made my own sample data. You can exchange 'A' with 'Event' and 'B' with 'Record'

Zanshin
  • 1,262
  • 1
  • 14
  • 30