1

I have a data frame with two columns 'Location' and 'Job Title'. I need to check what rows in Job Title have the name of Locations present in them.

        Location    Job Title
0   New York New York   Regional Manager Las Vegas and San Diego
1   New York City   Full Stack Engineer
2   San Francisco Bay Area  Director of Guitar Studies
3   Greater Los Angeles New England Institute of Technology
4   Greater Chicago New England Institute of Technology
... ... ...
984710  NaN Catering Sales Manager
984711  NaN Director, Research & Development and
984712  NaN HR Manager
984713  NaN Director of Development
984714  NaN Development Officer

There are 625 rows in Location and close to a million in Job Location.

I tried df['exist1']= df['Location'].isin(df['Job Title']) After that, I tried filtering it based on True values but it shows every value under 625 as TRUE. There are no values under 625 in the Location column.

Where am I going wrong? Any help would be greatly appreciated.

sirishp
  • 201
  • 1
  • 2
  • 4
  • This need to be done with for loop or numpy cherry char – BENY Jun 12 '20 at 02:00
  • If possible can you show me that – sirishp Jun 12 '20 at 02:05
  • Does this answer your question? [How to test if a string contains one of the substrings in a list, in pandas?](https://stackoverflow.com/questions/26577516/how-to-test-if-a-string-contains-one-of-the-substrings-in-a-list-in-pandas) – DJK Jun 12 '20 at 02:12

2 Answers2

0

Does this answer your question?:

df['exist1'] = df.apply(lambda x: x['Location'] in x['Job Title'], axis=1)

This is row-wise substring check (i.e. location of each row is checked in job title of same row). If you want to check ALL job titles against ALL locations, please let us know and I would be happy to edit it accordingly.

Ehsan
  • 12,072
  • 2
  • 20
  • 33
0

You can do with str.contains

df['exist1'] = df['Location'].str.contains('|'.join(df['Job Title'].dropna().tolist()))

If you would like match for each row

df1=df.dropna()
df1['exist1'] = [ x in y for x, y  in zip(df1['Location'], df1['Job Title'])]
df['exist1']=df1['exist1']
BENY
  • 317,841
  • 20
  • 164
  • 234