2

I'm currently working on a mock analysis of a mock MMORPG's microtransaction data. This is an example of a few lines of the CSV file:

PID Username    Age Gender ItemID   Item Name   Price

0   Jack78      20  Male    108    Spikelord    3.53
1   Aisovyak    40  Male    143  Blood Scimitar 1.56
2   Glue42      24  Male    92   Final Critic   4.88

Here's where things get dicey- I successfully use the groupby function to get a result where purchases are grouped by the gender of their buyers.

test = purchase_data.groupby(['Gender', "Username"])["Price"].mean().reset_index()

gets me the result (truncated for readability)

                    Gender        Username  Price
0                   Female     Adastirin33  $4.48
1                   Female   Aerithllora36  $4.32
2                   Female      Aethedru70  $3.54
...
29                  Female        Heudai45  $3.47
..                     ...             ...    ...
546                   Male        Yadanu52  $2.38
547                   Male      Yadaphos40  $2.68
548                   Male         Yalae81  $3.34

What I'm aiming for currently is to find the average amount of money spent by each gender as a whole. How I imagine this would be done is by creating a method that checks for the male/female/other tag in front of a username, and then adds the average spent by that person to a running total which I can then manipulate later. Unfortunately, I'm very new to Python- I have no clue where to even begin, or if I'm even on the right track.

Addendum: jezrael misunderstood the intent of this question. While he provided me with a method to clean up my output series, he did not provide me a method or even a hint towards my main goal, which is to group together the money spent by gender (Females are shown in all but my first snippet, but there are males further down the csv file and I don't want to clog the page with too much pasta) and put them towards a single variable.

Addendum2: Another solution suggested by jezrael,

purchase_data.groupby(['Gender'])["Price"].sum().reset_index()

creates

                  Gender     Price
0                 Female   $361.94
1                   Male $1,967.64
2  Other / Non-Disclosed    $50.19

Sadly, using figures from this new series (which would yield the average price per purchase recorded in this csv) isn't quite what I'm looking for, due to the fact that certain users have purchased multiple items in the file. I'm hunting for a solution that lets me pull from my test frame the average amount of money spent per user, separated and grouped by gender.

Edward
  • 4,443
  • 16
  • 46
  • 81
Minqiu Yu
  • 21
  • 2
  • Use `test = purchase_data.groupby(['Gender', "SN"])["Price"].mean().reset_index()` – jezrael Sep 14 '18 at 11:49
  • This changes nothing. All that's produced is a more clean version of the test series. I still have no idea how to group together the gender results- I'm looking for a method to somehow group the genders together and group their amounts spent. – Minqiu Yu Sep 14 '18 at 12:31
  • 1
    Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Sep 14 '18 at 12:47
  • This question is less about problems currently present in my code and more of an appeal for advice on how to proceed. I'll bold out more areas in my question to try to make it more obvious to you where I need assistance. – Minqiu Yu Sep 14 '18 at 12:59
  • Not understand, sorry. Maybe need `purchase_data.groupby(['Gender'])["Price"].sum().reset_index()` – jezrael Sep 14 '18 at 13:05
  • This returns the **total amount of money spent** by each gender as a group. In this scenario, some people have spent money multiple times which is why I'm trying to figure out **how much money that a player has spent per person.** I'll edit my question to further clarify. – Minqiu Yu Sep 14 '18 at 13:12

3 Answers3

1

It sounds to me like you think in terms of database tables. The groupby() does not return one by default -- which the group label(s) are not presented as a column but as row indices. But you can make it do in that way instead: (note the as_index argument to groupby())

mean = purchase_data.groupby(['Gender', "SN"], as_index=False).mean()
gender = mean.groupby(['Gender'], as_index=False).mean()

Then what you want is probably gender[['Gender','Price']]

adrtam
  • 6,991
  • 2
  • 12
  • 27
  • This solution is functionally similar to jezrael's most recent reply, but it immediately jumps to providing the average amount of money spent per purchase per gender. However,** I'm hunting for the average amount of money spent per person per gender** due to the fact that some users in this situation have bought multiple items. Good try though. – Minqiu Yu Sep 14 '18 at 13:26
  • You're right, the second step should be the average (mean) instead of summing again. – 576i Sep 16 '18 at 07:50
1

Basically, sum up per user, then average (mean) up per gender.

In one line

print(df.groupby(['Gender','Username']).sum()['Price'].reset_index()[['Gender','Price']].groupby('Gender').mean())

Or in some lines

df1 = df.groupby(['Gender','Username']).sum()['Price'].reset_index()
df2 = df1[['Gender','Price']].groupby('Gender').mean()
print(df2)

Some notes, I read your example from the clipboard

import pandas as pd
df = pd.read_clipboard()

which required a separator or the item names to be without spaces. I put an extra space into space lord for the test. Normally, you should provide an example file good enough to do the test, so you'd need one with at least one female in.

576i
  • 7,579
  • 12
  • 55
  • 92
  • This produces an identical result to jezrael's possible solution. I'm looking for a method or set of methods to combine all purchases of a gender while still accounting for certain members of the players listed having purchased multiple items. – Minqiu Yu Sep 15 '18 at 05:38
1

To get the average spent by per person, first need to find the mean of the usernames.

Then to get the average amount of average spent per user per gender, do groupby again:

df1 = df.groupby(by=['Gender', 'Username']).mean().groupby(by='Gender').mean()

df1['Gender'] = df1.index
df1.reset_index(drop=True, inplace=True)
df1[['Gender', 'Price']]
thelogicalkoan
  • 620
  • 1
  • 5
  • 13