Create a column based on the data of another dataframe

Question

I'm working on a dataset that has the following simplified structure:

SKU	Store code	sales
1	A432	123
2	B123	456
1	C234	098

So far I created another dataframe that groups the records by SKU and the sum of sales, but there are other fields aggregated using other methods.

How could I add columns for Sales at A, Sales at B and Sales at C.

So far I have tried working with the standard creation of columns using loc to filter the data, but i haven't being successful iterating over the SKUs

The output that I'm looking for is:

SKU	sales	A	B	C
1	123	201	15	0
2	456	456	7	150

so you want to ignore the number in `A432`? What would be the exact matching output? Maybe add a few more rows in the input to make the desired logic clear. Looks like you need a `pivot_table` — mozway, May 25 '23 at 12:43
yes, the number is just the identifier to the actual store, i'm grouping by chain — tomas, May 25 '23 at 12:45
How are the numbers in the output related to the input? SKU 2 has sales only in store B123 but in the output there are 456 sales coming from chain A... Also the sum of the chains does not add up to the number in sales — Matteo Zanoni, May 25 '23 at 13:04
these are only a reference, not true data by any means but the results should be additions — tomas, May 25 '23 at 13:05
Please make the minimal example **complete**, with all exact values that are matching in the input and output. I shouldn't be too hard with 6 values. — mozway, May 25 '23 at 13:35

score 0 · Answer 1 · answered May 25 '23 at 13:06

I don't really understand where some numbers such as 15, 150, 201... come from but you may try using pivot table

import pandas as pd
import numpy as np

dict_={'sku':[1,2,1],'store_code':['A3','B2','C1'],'sales':[123,456,98]}

df= pd.DataFrame(dict_)

df['code']=df.store_code.str[0]
df_pvt=pd.pivot_table(df,values='sales',index='sku',columns='code', aggfunc=np.sum)

Matteo Zanoni · Answer 2 · 2023-05-25T14:18:45.297

I would suggest firstly to group your data. If you are only intrested on aggregate data of each store chain for each SKU you can group by those two columns (the store chain can be calculated using .str on the store name series and getting the first character through slicing). After grouping you can get a series by aggregating (in this case using the sum() aggregation).

To restore the SKU and store chain columns you can reset_index().

The last step would be to pivot() your table.

import pandas as pd

df = pd.DataFrame({"SKU": [1,2,1], "Store code": ["A432", "B123", "C234"], "sales": [123, 456, 98]})  # or your actual df

output = (
    df.groupby(["SKU", "Store chain"])["sales"].sum()
    .reset_index()
    .pivot(index="SKU", columns="Store chain", values="sales")
    .fillna(0)
)
output["sales"] = output.sum(axis=0)

The groupby and pivot steps can be put together using the pivot_table() method like this:

# imports and df creation omitted

output = (
    df.pivot_table(index="SKU", columns="Store chain", values="sales", aggfunc=np.sum)
    .fillna(0)
)
output["sales"] = output.sum(axis=0)

While this might answer the authors question, it lacks some explaining words and/or links to documentation. Raw code snippets are not very helpful without some phrases around them. You may also find [how to write a good answer](https://stackoverflow.com/help/how-to-answer) very helpful. Please [edit] your answer. — Nick, May 25 '23 at 13:19
Thank you. It looks a lot better however there are some errors in the code that need to be addressed. — Nick, May 25 '23 at 23:54

Create a column based on the data of another dataframe

2 Answers2