-1

I have a dataframe which contains orders from a restaurant, order ids and prices of each item from the order. One row is a name of the product with its price and order id. I would like to calculate the average of all orders, but hence the order may consist of multiple products they are in separate rows.

I was trying to define what an order is in python but since I'm a newbie, with no success.

for order in df:
  n = 1
  order = df.order_id == n
  for order in orders:
    orders.median()
    n = n + 1

This is what the table looks like - as you can see the order id for the forst two items is the same:

order_id    quantity    item_name   choice_description                  item_price
0           1           1           Chips and Fresh Tomato Salsa    NaN 2.39
1           1           1           Izze    [Clementine]                3.39

I have a problem with the syntax, I don't know how to express it in python language. Could you please help?

Thanks a lot!

Michael Ruth
  • 2,938
  • 1
  • 20
  • 27
Joanna
  • 79
  • 6
  • Welcome to stack overflow! I took the liberty of doing an [edit] to make your table more readable, please check that it matches you actual data. It looks like what you want is a [pandas groupby](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) with `.sum()`, then the `mean()` of that output, but it would help if you include a [mcve] with your expected output based on the given input. See also: [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – G. Anderson Apr 16 '21 at 17:56

2 Answers2

1

Since items may have been ordered more than once, I would first calculate the price for all the same items in each order:

import pandas as pd

# mock data
df = pd.DataFrame({'order_id': [0, 0, 1, 1],
                   'quantity': [1, 2, 1, 3], 
                   'item_price': [2.39, 9.99, 3.39, 19.99]})

df['price'] = df.quantity * df.item_price
df
    order_id    quantity    item_price  price
0   0           1            2.39        2.39
1   0           2            9.99       19.98
2   1           1            3.39        3.39
3   1           3           19.99       59.97

Then you can use df.groupby() to sum these prices for each order:

df_orders = df.groupby('order_id').sum()
df_orders
order_id  quantity  item_price  price       
0         3         12.38       22.37
1         4         23.38       63.36

And finally calculate the average price over orders from this new dataframe:

df_orders.price.mean()
42.865
Arne
  • 9,990
  • 2
  • 18
  • 28
  • I was adding my answer the very same moment you posted yours! Thanks a lot, that's exactly what I managed to do. Thank you ever so much! – Joanna Apr 17 '21 at 11:38
  • Thanks for your feedback @Joanna! Yes, that timing was unexpected, given that you posted the question 18 hours ago. But it's good to have confirmation that this is indeed the right answer and works for you. Note that it can be useful for others to include some sample data in executable form as I did above, so that the code is self-contained. – Arne Apr 17 '21 at 11:45
0

As suggested in the comments I used pandas groupby. This is how I dealt with the problem:

First I changed the type of the column to numeric. They were strings so calculating didn't work. In order to calculate the mean() or median() of the order I decided to add one more column to my dataframe containing the value of the order. I multiplied quantity and price of each item from one order to make sure I take into account that a given product was bought in bigger quantity.

item_price = pd.to_numeric(data1['item_price'])
quantity = pd.to_numeric(data1['quantity'])
data1['order_value'] = quantity* item_price

[![The table after adding the new column][1]][1]

Then I sorted the dataframe by the order_id and summed up the value of each item to get the total value of the order.

grouped = data1.groupby(['order_id']).sum()

        quantity  order_value
order_id                       
1                4        11.56
2                2        33.96
3                2        12.67
4                2        21.00
5                2        13.70

Then I called the mean() function on the column order_value:

data1.order_value.mean()
Joanna
  • 79
  • 6