0

I'm doing a basics in python course and trying to complete the challenge we've been set. I need to determine the average sale price for each product category after reading a sales data spreadsheet. I can read the spreadsheet and I know how to separate out the category column using .groupby . But I can't make it work out the average sale price as it tries to do it on the category column which is a string. I've put the code below. Any help would be appreciated. Thanks

import pandas as pd

def read_data():
    df = pd.read_csv('sales_dataset.csv')
    print(df)
    return df

read_data()


def average_price():
    df = read_data()
    average = df.groupby(["Sale Price"]).mean()
    print(average)
    return average

average_price()

I thought I was following the code that would calculate an average sale price for each product category listed in a spreadsheet. It tried to do it on the category column, which is a string and not the sale price column.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Karyn
  • 13
  • 1
  • Welcome to Stack Overflow! Please take the [tour]. SO is a Q&A site, but this is not a question. You might want to ask e.g. "Why is .groupby().mean() trying to take the mean of the grouping column?" although, where you've fundamentally misunderstood how to use groupby, I'm not sure if there's a sensible question in here. Sorry if that sounds harsh, but SO is not meant to teach the basics, which can be better handled by resources like [1/2] – wjandrea Apr 11 '23 at 14:59
  • [2/2] ... tutorials or the user guide: [Grouping - 10 minutes to pandas](//pandas.pydata.org/docs/user_guide/10min.html#grouping). Check out [How to ask and answer homework questions](//meta.stackoverflow.com/q/334822) and [ask] in general. And for future reference, [How to make good reproducible pandas examples](/q/20109391/4518341). – wjandrea Apr 11 '23 at 14:59

1 Answers1

0

and welcome to Stack Overflow! Typically when you ask questions (especially about data analysis), it would help answerers come to an appropriate solution if you shared some of the data (a few lines of the csv file, or a few rows from your df) with us.

After reading your issue, I can help clarify some things. DataFrame.groupby(...) is used to specify a grouping column- e.g. what column do you want to perform a calculation across. In your case you don't want to group on your prices, you'll more likely than not want to group on your 'product category' column (or whatever its name is in your DataFrame).

Once you call DataFrame.groupby(['product category']) you can then use square-brackets to select a subset of the DataFrame columns you would like to operate on. So we could use some code that looks like this:

df.groupby(['product category'])['Sale Price'].mean()

Which translates to:

  • group the rows of my dataframe based on the 'product category' column
  • for each of those groupings, calculate the average 'Sale Price'
Cameron Riddell
  • 10,942
  • 9
  • 19