How to add new column to a dataframe and fill its values based on condition in python

Question

So, I have this table with company names and the value of each order they ordered

Order Id	Company Id	Company Name	Date	Order Value
3455	80EYLOKP9E762WKG	Chimera-Chasing	18-02-2017	2345
4875	TLEXR1HZWTUTBHPB	Mellow Ezra	30-07-2015	3245
8425	839FKFW2LLX4LMBB	Chimera-Chasing	27-05-2016	4566
4837	97OX39BGVMHODLJM	Worst Mali	27-09-2018	5674
3434	5T4LGH4XGBWOD49Z	Indonesian Grigory	14-01-2016	7654

And, I need to add a new column which will include the segment of each company based on their total orders value

I decided to divide them into 4 segments (Prime, Platinum, Gold, Silver)

So, my approach was to first aggregate this table into a new table with total orders value for each company

with this code:

seg = orders.loc[:,['Company Name', 'Order Value']].groupby('Company Name').sum()

Outcome:

Company Name	Order Value
'48 Wills	65325
10-Day Causes	85473
10-Hour Leak	83021
Youngish Mark'S	120343
10-Year-Old Alba	97968
...	...

Then, I used conditions to create new column with segments based on total orders value and added this column to the aggregated data frame "seg"

with this code

conditions = [
    (seg['Order Value'] >= 124485),
    (seg['Order Value'] >= 105503) & (seg['Order Value'] < 124485),
    (seg['Order Value'] >= 88174) & (seg['Order Value'] < 105503),
    (seg['Order Value'] < 88174)
                 ]

values = ['Prime', 'Platinum', 'Gold', 'Silver']

seg['Segment'] = np.select(conditions, values)

Now, I need to add this segment column to the original dataframe (orders) with a condition where company name in seg match company name in orders but I dont know how to do that

Try looking at [Creating a new column based on if-elif-else condition](https://stackoverflow.com/a/21711869/16653700). — Alias Cartellano, Feb 23 '23 at 18:57

Michael Castle · Accepted Answer · 2023-02-23T21:08:43.570

I believe what you are wanting is pd.merge (see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html):

orders = orders.merge(seg, on=['Company Name'], how='left')

Note that you will have a duplicated 'Order Value' column in your dataframe. To fix this I would include the following line in before the merge code:

seg = seg.rename(columns={'Order Value': 'Total Order Value'})

Full example:

import pandas as pd
import numpy as np

data = {
    'Order ID': ['3455', '4875', '8425', '4837', '3434'],
    'Company ID': ['80EYLOKP9E762WKG', 'TLEXR1HZWTUTBHPB', '839FKFW2LLX4LMBB', '97OX39BGVMHODLJM', '5T4LGH4XGBWOD49Z'],
    'Company Name': ['Chimera-Chasing', 'Mellow Ezra', 'Chimera-Chasing', 'Worst Mali', 'Indonesian Grigory'],
    'Date': ['18-02-2017', '30-07-2015', '27-05-2016', '27-09-2018', '14-01-2016'   ],
    'Order Value': [2345, 3245, 4566, 5674, 7654]
}

orders = pd.DataFrame(data = data)
seg = orders.loc[:,['Company Name', 'Order Value']].groupby('Company Name').sum()

conditions = [
    (seg['Order Value'] >= 124485),
    (seg['Order Value'] >= 105503) & (seg['Order Value'] < 124485),
    (seg['Order Value'] >= 88174) & (seg['Order Value'] < 105503),
    (seg['Order Value'] < 88174)
                 ]

values = ['Prime', 'Platinum', 'Gold', 'Silver']

seg['Segment'] = np.select(conditions, values)
seg = seg.rename(columns={'Order Value': 'Total Order Value'})

orders = orders.merge(seg, on=['Company Name'], how='left')

print(orders)
  Order ID        Company ID        Company Name        Date  Order Value  Total Order Value Segment
0     3455  80EYLOKP9E762WKG     Chimera-Chasing  18-02-2017         2345               6911  Silver
1     4875  TLEXR1HZWTUTBHPB         Mellow Ezra  30-07-2015         3245               3245  Silver
2     8425  839FKFW2LLX4LMBB     Chimera-Chasing  27-05-2016         4566               6911  Silver
3     4837  97OX39BGVMHODLJM          Worst Mali  27-09-2018         5674               5674  Silver
4     3434  5T4LGH4XGBWOD49Z  Indonesian Grigory  14-01-2016         7654               7654  Silver

You can delete the 'Total Order Value' column with the following line if you do not want it:

orders = orders.drop(labels=['Total Order Value'], axis=1)

How to add new column to a dataframe and fill its values based on condition in python

1 Answers1