-1

I need to create a df_2 with a column based on a couple of formulae. I could create multiple df's with group and merge. But is there an efficient way for achieve?

given df_1 is as below,

df_1 = pd.DataFrame([['A', '1/1/2021','SKU_1','Customer Backhaul','34,848','$-51,100'],
              ['A', '1/1/2021','SKU_1','FOB','75,357','$12,407,112'],
              ['A', '1/1/2021','SKU_1','Price','75,357','$12,407,112'],
              ['A', '1/1/2021','SKU_1','Vendor Freight - Delivered','40,511','$65,470'],
              ['B', '1/1/2021','SKU_1','Customer Backhaul','197,904','$-157,487'],
              ['B', '1/1/2021','SKU_1','FOB','931,866','$50,059,515'],
              ['B', '1/1/2021','SKU_1','Price','931,866','$62,333,500'],
              ['B', '1/1/2021','SKU_1','Vendor Freight - Delivered','740,355','$1,220,927']], 
              columns=['Group', 'Month','ID','Cost Type','Volume','Order Cost'])

Formulae for columns in df_2 below are:

  • Freight = absolute value of (Customer Backhaul) + Vendor Delivered
  • FOB = FOB
  • Price = Price - Customer Backhaul
  • Volume = Volume of FOB

df_2 is the expected result,

Out[df_2]
  Group   Month     ID      Cost Type   Volume    Cost
0   A   1/1/2021    SKU_1   Freight     75,357  $116,570
1   A   1/1/2021    SKU_1   FOB         75,357  $12,407,112
2   A   1/1/2021    SKU_1   Price       75,357  $12,434,063
3   B   1/1/2021    SKU_1   Freight     931,866 $1,378,414
4   B   1/1/2021    SKU_1   FOB         931,866 $50,059,515
5   B   1/1/2021    SKU_1   Price       931,866 $62,490,988
  • refer to the df_1 (which is a given df) and df_2 is an expected df, where 'value' is calculated by given formula. I hope this should suffice?? –  Oct 01 '21 at 21:09
  • It would be better if the dataframe was minimal. Ie it should contain the least amount of info which still reproduces your problem. – zabop Oct 01 '21 at 21:11
  • For example, this is a rather minimal reproducible df: df = pd.DataFrame([[1, 2], [1, 3], [4, 6]], columns=['A', 'B']) – zabop Oct 01 '21 at 21:11
  • You can find more info on how to create such examples [here](https://stackoverflow.com/a/20159305/8565438). – zabop Oct 01 '21 at 21:12
  • updated the question, please take a look... –  Oct 01 '21 at 21:33

1 Answers1

1

If we want to do additions and subtractions we first need to transform data to numerical data:

>>> df_1[['Volume', 'Order Cost']] = df_1[['Volume', 'Order Cost']].transform(
...      lambda s: pd.to_numeric(s.str.replace('^\$|,', '', regex=True))
... )
>>> df_1
  Group     Month     ID                   Cost Type  Volume  Order Cost
0     A  1/1/2021  SKU_1           Customer Backhaul   34848      -51100
1     A  1/1/2021  SKU_1                         FOB   75357    12407112
2     A  1/1/2021  SKU_1                       Price   75357    12407112
3     A  1/1/2021  SKU_1  Vendor Freight - Delivered   40511       65470
4     B  1/1/2021  SKU_1           Customer Backhaul  197904     -157487
5     B  1/1/2021  SKU_1                         FOB  931866    50059515
6     B  1/1/2021  SKU_1                       Price  931866    62333500
7     B  1/1/2021  SKU_1  Vendor Freight - Delivered  740355     1220927

Then I find the most elegant way to solve your problem would be to pivot your dataframe to have one row per order, and cost types in columns:

>>> cost = df_1.pivot(['Group', 'Month', 'ID'], 'Cost Type', 'Order Cost')
>>> cost
Cost Type             Customer Backhaul       FOB     Price  Vendor Freight - Delivered
Group Month    ID                                                                      
A     1/1/2021 SKU_1             -51100  12407112  12407112                       65470
B     1/1/2021 SKU_1            -157487  50059515  62333500                     1220927

Then you can simply write your operations as desired:

>>> cost['Customer Backhaul'] + cost['Vendor Freight - Delivered']
Group  Month     ID   
A      1/1/2021  SKU_1      14370
B      1/1/2021  SKU_1    1063440
dtype: int64

And so we can combine all your desired formulas for df_2 in a concise and readable manner, and then un-pivot to get your desired format with .unstack():

>>> df_2 = pd.concat({
...     'Freight': cost['Customer Backhaul'].abs() + cost['Vendor Freight - Delivered'],
...     'FOB': cost['FOB'],
...     'Price': cost['Price'] - cost['Customer Backhaul'],
... }, axis='columns', names=['Order Type']).stack('Order Type').to_frame('Cost').reset_index()
>>> df_2
  Group     Month     ID Order Type      Cost
0     A  1/1/2021  SKU_1    Freight    116570
1     A  1/1/2021  SKU_1        FOB  12407112
2     A  1/1/2021  SKU_1      Price  12458212
3     B  1/1/2021  SKU_1    Freight   1378414
4     B  1/1/2021  SKU_1        FOB  50059515
5     B  1/1/2021  SKU_1      Price  62490987

Then the Volume can be gotten using a merge after selecting the FOB values:

>>> fob_volume = df_1.loc[df_1['Cost Type'].eq('FOB'), ['Group', 'Month', 'ID', 'Volume']]
>>> fob_volume
  Group     Month     ID  Volume
1     A  1/1/2021  SKU_1   75357
5     B  1/1/2021  SKU_1  931866
>>> df_2 = df_2.merge(fob_volume)
>>> df_2
  Group     Month     ID Order Type      Cost  Volume
0     A  1/1/2021  SKU_1    Freight    116570   75357
1     A  1/1/2021  SKU_1        FOB  12407112   75357
2     A  1/1/2021  SKU_1      Price  12458212   75357
3     B  1/1/2021  SKU_1    Freight   1378414  931866
4     B  1/1/2021  SKU_1        FOB  50059515  931866
5     B  1/1/2021  SKU_1      Price  62490987  931866

Finally if you want to reapply styling you can use .apply():

>>> df_2['Cost'] = df_2['Cost'].apply('${:,.0f}'.format)
>>> df_2['Volume'] = df_2['Volume'].apply('{:,.0f}'.format)
>>> df_2
  Group     Month     ID Order Type         Cost   Volume
0     A  1/1/2021  SKU_1    Freight     $116,570   75,357
1     A  1/1/2021  SKU_1        FOB  $12,407,112   75,357
2     A  1/1/2021  SKU_1      Price  $12,458,212   75,357
3     B  1/1/2021  SKU_1    Freight   $1,378,414  931,866
4     B  1/1/2021  SKU_1        FOB  $50,059,515  931,866
5     B  1/1/2021  SKU_1      Price  $62,490,987  931,866
Cimbali
  • 11,012
  • 1
  • 39
  • 68