Python - Pandas DataFrame manipulation

Question

I've got a DataFrame called product with a list of orders, products, and quantities for each product. Here's a screenshot:

I need to make a new DataFrame that has a row for each product name and two more columns with the sum of products ordered (basically a sum on the column quantity per product) and the total sales for each product (sum on column total per product).

I made this function:

products_unique = products['product_id'].unique()

names = [
    products.loc[
        products['product_id'] == elem
    ]['name'].unique()
    for elem in products_unique
]

orders = [
    len(products.loc[
        products['product_id'] == elem
    ])
    for elem in products_unique
]

totals = [
    products.loc[
        products['product_id'] == elem
    ]['total'].sum()
    for elem in products_unique
]

chart_data = pd.DataFrame({
    'Prodotti': products_unique,
    'Nome': names,
    'Ordini': orders,
    'Totale': totals
})

Now, this function works for the purpose I had, but there is something I don't understand. When I run it, I got this:

As you can see, values in the column names are of the type list. Why does this happen?

And moreover, is there a cleaner way to achieve what I'm building?

Thanks to everyone who gonna help me!

[How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/4046632). Screenshot of data is virtually useless.That said, probably you need something like `groupby()` or pivot — buran, Feb 21 '23 at 15:41
Check https://stackoverflow.com/questions/39922986/how-do-i-pandas-group-by-to-get-sum — buran, Feb 21 '23 at 15:42
`.unique()` returns a list of unique values in the pd.Series you are using it on. In your case, that list only contains one value, but it is still a list — Florent Monin, Feb 21 '23 at 15:45
@FlorentMonin ok, this was my bad in getting the `.unique()` function. Thanks man — Davide, Feb 21 '23 at 17:13

score 0 · Accepted Answer · answered Feb 21 '23 at 15:47

0

chart_data = products.groupby('product_id').agg({
    'name': lambda x: ', '.join(x.unique()),
    'total': ['sum', 'count']
})
chart_data.columns = ['Nome', 'Totale', 'Ordini']
chart_data.reset_index(inplace=True)
chart_data.rename(columns={'product_id': 'Prodotti'}, inplace=True)

answered Feb 21 '23 at 15:47

saurabh srivastava

31

one possible approach is to use the groupby method to group the products by product_id and then aggregate the data using the agg method. – saurabh srivastava Feb 21 '23 at 15:48

score 0 · Answer 2 · answered Feb 21 '23 at 15:49

Use groupby_agg:

out = (df.groupby('name') .agg(Prodotti=('product_id', 'first'),
                               Nome=('name', 'first'),
                               Ordini=('total', 'size'),
                               Totale=('total', 'sum'))
         .reset_index(drop=True))

Output:

>>> out
   Prodotti    Nome  Ordini  Totale
0      7980  Prod A       2      22
1      8603  Prod B       1      14

>>> df
   product_id    name  total
0        7980  Prod A     10
1        7980  Prod A     12
2        8603  Prod B     14

Python - Pandas DataFrame manipulation

2 Answers2