3

I have following dataframe in pandas

 job_desig             salary
 senior analyst        12
 junior researcher     5
 scientist             20
 sr analyst            12

Now I want to generate one column which will have a flag set as below

 sr = ['senior','sr']
 job_desig             salary     senior_profile
 senior analyst        12         1  
 junior researcher     5          0
 scientist             20         0 
 sr analyst            12         1

I am doing following in pandas

 df['senior_profile'] = [1 if x.str.contains(sr) else 0 for x in 
                        df['job_desig']]
Neil
  • 7,937
  • 22
  • 87
  • 145
  • Possible duplicate of [Using str.contains() in pandas with dataframes](https://stackoverflow.com/questions/19169649/using-str-contains-in-pandas-with-dataframes) – ayorgo May 25 '19 at 07:38

2 Answers2

5

You can join all values of list by | for regex OR, pass to Series.str.contains and last cast to integer for True/False to 1/0 mapping:

df['senior_profile'] = df['job_desig'].str.contains('|'.join(sr)).astype(int)

If necessary, use word boundaries:

pat = '|'.join(r"\b{}\b".format(x) for x in sr)
df['senior_profile'] = df['job_desig'].str.contains(pat).astype(int)

print (df)
           job_desig  salary  senior_profile
0     senior analyst      12               1
1  junior researcher       5               0
2          scientist      20               0
3         sr analyst      12               1

Soluttion with sets, if only one word values in list:

df['senior_profile'] = [int(bool(set(sr).intersection(x.split()))) for x in df['job_desig']]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
3

You can just do it by simply using str.contains

df['senior_profile'] = df['job_desig'].str.contains('senior') | df['job_desig'].str.contains('sr')
Ashok Rayal
  • 405
  • 3
  • 16