Pandas: Create a column based on applying string conditions on other column

Question

I have a dataframe df as follows:

    KPI             Tata    JSW 
Gross Margin %      0.582   0.476   
EBITDA Margin %     0.191   0.23    
EBIT Margin %       0.145   0.183   
SG&A/Revenue        0.141   0.03    
COGS/Revenue        0.418   0.524   
CapE/Revenue        0.0577  0.1204      
ROA                 0.064   0.093   
ROE                 0.138   0.243       
Revenue/Employee $K 290.9   934.4   
Inventory Turnover  2.2     3.27    
AR Turnover         13.02   14.29   
Tot Asset Turnover  0.68    0.74    
Current Ratio       0.9     0.8 
Quick Ratio         0.3     0.4

I am trying to add a column, say, scope based on the following criterion:

if df[df['KPI'].str.contains('Margin|Revenue|ROE|ROA')].shape[0] > 0:
  z = 'Max'
elif df[df['KPI'].str.contains('Quick|Current|Turnover')].shape[0] > 0:
  z = 'Min'

In other words, If the field KPI contains any word like Revenue or Margin then the column scope should take Max else Min. Now there is an exception in KPI == COGS/Revenue or KPI == CapEx/Revenue. In this case the scope should take Min despite the string Revenue is present.

So the resultant df should look like below:

In order to achieve the same I am trying to apply a function on the field KPI.

def scope_superlative(col_name):
  df_test = df[df[col_name].str.contains('Margin| Revenue|ROA|ROE')]
  if df_test.shape[0] > 0:
    z = 'Max'
  else:
    df_test = df[df[col_name].str.contains('/Revenue|Current|Quick|Turnover')] ##<-- I want to check if string 'Revenue' is in denominator.##
    if df_test.shape[0] > 0:
      z='Min'
  return z
##Applying this function##
df['scope'] = df.KPI.apply(lambda x : scope_superlative(x))

The above code is generating an Error as KeyError: 'Gross Margin %

If I use df['scope']=df.apply(scope_superlative('KPI'), axis=1) I get an Error as AttributeError: 'DataFrame' object has no attribute 'Max'.

Can anybody please help on this? Is there anything wrong in function or applying technique?

Instead of posting photo of your data. Post it as text. So its easy to copy for others. — Poojan, Oct 30 '19 at 17:33

score 2 · Answer 1 · edited Oct 30 '19 at 18:07

2

Use np.select for multiple conditions and values

conditions = [
    df['KPI'].str.contains('Margin| Revenue|Revenue/|ROE|ROA'),
    df['KPI'].str.contains('/Revenue|Current|Quick|Turnover')
]
values = ['Max', 'Min']
df['scope'] = pd.np.select(conditions, values, default='Min/Max')

Keep the default parameter to the value you desire when all conditions don't match.

OR

If you have only one condition, then,

condition = df['KPI'].str.contains('Margin| Revenue|ROE|ROA')
df['scope'] = pd.np.where(condition, 'Max', 'Min')

The first parameter to np.where is the condition, second is value to put when True and third is value to put when False

edited Oct 30 '19 at 18:07

pythondumb

1,187
1
15
30

answered Oct 30 '19 at 17:38

Vishnudev Krishnadas

10,679
2
23
55

your code worked! I just edited a little and it worked. I wonder why can't we use simple `apply` method here? – pythondumb Oct 30 '19 at 18:01
@pythondumb Read https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code. The above code is vectorized. – Vishnudev Krishnadas Oct 30 '19 at 18:05
@pythondumb check my answer it will work using apply method. – Poojan Oct 30 '19 at 18:09
@Vishnudev +1 for the reference you provided for apply. Thanks! – Kaushal28 Oct 30 '19 at 18:09

Kaushal28 · Accepted Answer · 2019-10-30T18:00:41.903

1

I think you are looking for something like this:

import pandas as pd
import re

def fn(row):
    if re.search('/Revenue|Current|Quick|Turnover', row['KPI']):
        return 'Min'
    elif re.search('Margin|Revenue|ROA|ROE', row['KPI']):
        return 'Max'

df = pd.read_csv('so.csv')

df['scope'] = df.apply (lambda row: fn(row), axis=1)
print (df)

This simply uses df.apply() function which takes each row and applies the provided function on it.

This gives following result on given data:

0        Gross Margin %    0.5820    0.4760   Max
1       EBITDA Margin %    0.1910    0.2300   Max
2         EBIT Margin %    0.1450    0.1830   Max
3          SG&A/Revenue    0.1410    0.0300   Min
4          COGS/Revenue    0.4180    0.5240   Min
5          CapE/Revenue    0.0577    0.1204   Min
6                   ROA    0.0640    0.0930   Max
7                   ROE    0.1380    0.2430   Max
8   Revenue/Employee $K  290.9000  934.4000   Max
9    Inventory Turnover    2.2000    3.2700   Min
10          AR Turnover   13.0200   14.2900   Min
11   Tot Asset Turnover    0.6800    0.7400   Min
12        Current Ratio    0.9000    0.8000   Min
13          Quick Ratio    0.3000    0.4000   Min

Hope this helps!

edited Oct 30 '19 at 18:00

answered Oct 30 '19 at 17:40

Kaushal28

5,377
5
41
72

I have tried this and it did not work. `AttributeError: ("'str' object has no attribute 'contains'", 'occurred at index 0')` – pythondumb Oct 30 '19 at 17:45
The OP doesn't say everything other than `'Margin| Revenue|ROA|ROE'` has to be min. @Kaushal28 and your way is not efficient because when you do `df.apply` the entire row gets selected per iteration. But the operation is only needed in one column. – Vishnudev Krishnadas Oct 30 '19 at 17:47
Updated the code @pythondumb. Just change the conditions in `fn` function based on your requirements – Kaushal28 Oct 30 '19 at 17:50
@Vishnudev Yes I know but condition is not that important in this case. I'm just suggesting a possible approach to OP. – Kaushal28 Oct 30 '19 at 17:50
@Kaushal28: I am getting `TypeError: string indices must be integers` – pythondumb Oct 30 '19 at 18:08
@pythondumb there might be something wrong with input...Code is working in my environment with given data...Can you tell me the line number of error? – Kaushal28 Oct 30 '19 at 18:10
@Kaushal28: Yes I have rectified the code in my notebook. It is working fine. Thanks. – pythondumb Oct 30 '19 at 18:14
@pythondumb After having some years of experience with using pandas, you'll realise that using apply was the worst mistake that you made. Keep in mind that not everything that gets work done is good. – Vishnudev Krishnadas Oct 30 '19 at 18:22

Poojan · Answer 3 · 2019-10-30T18:02:10.917

0

you can use apply function on column you want.

import pandas as pd
import re
d = pd.DataFrame({'a':['a b c','b c d','p q r','d e f','c b a'],'b':[1,2,3,4,5]})

d['scope'] = d['a'].apply(lambda x: 'MAX' if re.search('a|b|e', x) else 'MIN')

d

Output:

      a     b   scope
0   a b c   1   MAX
1   b c d   2   MAX
2   p q r   3   MIN
3   d e f   4   MAX
4   c b a   5   MAX

for your data this should work.

df['Scope'] = df['KPI'].apply(lambda x: 'MAX' if re.search('Margin| Revenue|ROE|ROA', x) else 'MIN')

edited Oct 30 '19 at 18:02

answered Oct 30 '19 at 17:40

Poojan

3,366
2
17
33

How to tackle the exception? I.e. `COGS/Revenue` as it should take 'Min'. As per your logic it would take 'Max' – pythondumb Oct 30 '19 at 17:53
@pythondumb Check now. – Poojan Oct 30 '19 at 18:02

Pandas: Create a column based on applying string conditions on other column

3 Answers3

Linked