-2

I'm working on a dataset that has the following simplified structure:

SKU Store code sales
1 A432 123
2 B123 456
1 C234 098

So far I created another dataframe that groups the records by SKU and the sum of sales, but there are other fields aggregated using other methods.

How could I add columns for Sales at A, Sales at B and Sales at C.

So far I have tried working with the standard creation of columns using loc to filter the data, but i haven't being successful iterating over the SKUs

The output that I'm looking for is:

SKU sales A B C
1 123 201 15 0
2 456 456 7 150
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
tomas
  • 17
  • 5
  • so you want to ignore the number in `A432`? What would be the exact matching output? Maybe add a few more rows in the input to make the desired logic clear. Looks like you need a `pivot_table` – mozway May 25 '23 at 12:43
  • yes, the number is just the identifier to the actual store, i'm grouping by chain – tomas May 25 '23 at 12:45
  • Please make an illustration table for your desired result. – Dinh Quang Tuan May 25 '23 at 12:45
  • 2
    How are the numbers in the output related to the input? SKU 2 has sales only in store B123 but in the output there are 456 sales coming from chain A... Also the sum of the chains does not add up to the number in sales – Matteo Zanoni May 25 '23 at 13:04
  • these are only a reference, not true data by any means but the results should be additions – tomas May 25 '23 at 13:05
  • Please make the minimal example **complete**, with all exact values that are matching in the input and output. I shouldn't be too hard with 6 values. – mozway May 25 '23 at 13:35

2 Answers2

0

I don't really understand where some numbers such as 15, 150, 201... come from but you may try using pivot table

import pandas as pd
import numpy as np

dict_={'sku':[1,2,1],'store_code':['A3','B2','C1'],'sales':[123,456,98]}

df= pd.DataFrame(dict_)

df['code']=df.store_code.str[0]
df_pvt=pd.pivot_table(df,values='sales',index='sku',columns='code', aggfunc=np.sum)

enter image description here

Dinh Quang Tuan
  • 468
  • 3
  • 10
0

I would suggest firstly to group your data. If you are only intrested on aggregate data of each store chain for each SKU you can group by those two columns (the store chain can be calculated using .str on the store name series and getting the first character through slicing). After grouping you can get a series by aggregating (in this case using the sum() aggregation).

To restore the SKU and store chain columns you can reset_index().

The last step would be to pivot() your table.

import pandas as pd

df = pd.DataFrame({"SKU": [1,2,1], "Store code": ["A432", "B123", "C234"], "sales": [123, 456, 98]})  # or your actual df

output = (
    df.groupby(["SKU", "Store chain"])["sales"].sum()
    .reset_index()
    .pivot(index="SKU", columns="Store chain", values="sales")
    .fillna(0)
)
output["sales"] = output.sum(axis=0)

The groupby and pivot steps can be put together using the pivot_table() method like this:

# imports and df creation omitted

output = (
    df.pivot_table(index="SKU", columns="Store chain", values="sales", aggfunc=np.sum)
    .fillna(0)
)
output["sales"] = output.sum(axis=0)
Matteo Zanoni
  • 3,429
  • 9
  • 27
  • While this might answer the authors question, it lacks some explaining words and/or links to documentation. Raw code snippets are not very helpful without some phrases around them. You may also find [how to write a good answer](https://stackoverflow.com/help/how-to-answer) very helpful. Please [edit] your answer. – Nick May 25 '23 at 13:19
  • Thank you. It looks a lot better however there are some errors in the code that need to be addressed. – Nick May 25 '23 at 23:54