1

I am trying to get the mean value for a list of percentages from an Excel file which has data. My current code is as follows:

import numpy as pd
data = pd.DataFrame =({'Percentages': [.20, .10, .05], 'Nationality':['American', 'Mexican', 'Russian'], 
'Gender': ['Male', 'Female'], 'Question': ['They have good looks']})

pref = data[data.Nationality == 'American']
prefPref = pref.pivot_table(data.Percentage.mean(), index=['Question'], column='Gender')

The error is coming from where I try to get the .mean() from my ['Percentage'] list. So, how can I get the mean from the list of Percentages? Do I need to create a variable for the mean value, and if so how to I implement that into the code?

jpp
  • 159,742
  • 34
  • 281
  • 339
Jim C
  • 13
  • 1
  • 4
  • Welcome to SO. Please have a look at [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). This will give us a better chance of understanding your data structure and provide a solution. – jpp Feb 21 '18 at 01:04
  • Hi jpp, thanks for the response! I will take a look at the thread and try to reconstruct the question to make it easier for people to replicate. – Jim C Feb 21 '18 at 01:10
  • jpp, I attempted to make the question easier to replicate. Would you mind taking a look and returning any other suggestions? If it makes more sense, I replied to an Answer I received with another issue I have encountered. – Jim C Feb 21 '18 at 01:48
  • Not sure about your other question, but I've got your pivot table working if that helps. – jpp Feb 21 '18 at 01:50
  • After I used "from pandas import *" it worked like a charm, I have checked your answer as the solution thanks for all the help. – Jim C Feb 21 '18 at 01:58

2 Answers2

2

["Percentage"] is a list containging the single string item "Percentage". It isn't possible to calculate a mean from lists of text.

In addition, the method .mean() doesn't exist in Python for generic lists, have a look at numpy for calculating means and other mathematical operations.

For example:

import numpy
numpy.array([4,2,6,5]).mean()
mustachioed
  • 533
  • 3
  • 18
Jon
  • 401
  • 3
  • 11
  • Jon thanks for taking a look at my question. I just edited it to make it easier to replicate. The basics of your response makes a lot of sense and helps, but I now I am getting another error: "KeyError: 0.16664583333332". I apologize if this is brutally easy and I am completely missing the point, but I am only an intermediate level Comp-Sci student. – Jim C Feb 21 '18 at 01:46
-1

Here is a reworked version of your pd.pivot_table. See also How to pivot a dataframe.

import pandas as pd, numpy as np

data = pd.DataFrame({'Percentages': [0.20, 0.10, 0.05],
                     'Nationality': ['American', 'American', 'Russian'], 
                     'Gender': ['Male', 'Female', 'Male'],
                     'Question': ['Q1', 'Q2', 'Q3']})

pref = data[data['Nationality'] == 'American']

prefPref = pref.pivot_table(values='Percentages', index='Question',\
                            columns='Gender', aggfunc='mean')

# Gender    Female  Male
# Question              
# Q1           NaN   0.2
# Q2           0.1   NaN
jpp
  • 159,742
  • 34
  • 281
  • 339