pandas search for substring over multiple columns

Question

I have a df such that

       c_name  f_name 
0      abc     abc12  
1      xyz     abc1  
2      mnq     mnq2

The goal is to find a substring across the two columns an know which column it belongs to. Preference should be to c_name, as in if the substring is in both the columns then c_name gets precedence For eg: if I search for abc in the above dataframe I should somehow get row 0 abc for c_name and row 1 abc1 for f_name.

To solve this I started with df[df['c_name'].str.contains('abc', case=False)] which will give me the results for c_name. The question now is to how to exclude the rows where I already have the results from performing the same operation on f_name. Any help is greatly appreciated!

Does this answer your question? [pandas dataframe str.contains() AND operation](https://stackoverflow.com/questions/37011734/pandas-dataframe-str-contains-and-operation) — Abu Shoeb, Apr 26 '21 at 18:25

score 2 · Accepted Answer · answered Jan 17 '17 at 18:21

import pandas as pd
row  =[['abcx','abcy'],
       ['efg','abcz'],
       ['higj','UK']]
df= pd.DataFrame(row)
df.columns = ['c_name', 'f_name']

print df[df['c_name'].str.contains('abc', case=False)]

delta_df =df[~df['c_name'].str.contains('abc', case=False)]
print delta_df[delta_df['f_name'].str.contains('abc', case=False)]

output

  c_name f_name
0   abcx   abcy
  c_name f_name
1    efg   abcz

score 2 · Answer 2 · answered Jan 18 '17 at 01:44

stack into a series
str.contains to get truth value of sub string
unstack to get back dataframe
subset results to ensure at least one match
idxmax(1) gets first True in columns

def find_subtext(df, txt):
    contains = df.stack().str.contains(txt).unstack()
    return contains[contains.any(1)].idxmax(1)

find_subtext(df, 'abc')

0    c_name
1    f_name
dtype: object

df.assign(abc=find_subtext(df, 'abc'))

  c_name f_name     abc
0    abc  abc12  c_name
1    xyz   abc1  f_name
2    mnq   mnq2     NaN

I can always rely on you to come through with a response :) Thank you very much. Its an interesting approach — Fizi, Jan 18 '17 at 23:28

score 0 · Answer 3 · answered Jan 17 '17 at 18:16

0

Mark your first search result as something like 2. This will be overridden(by 1) if the new search result meets.

answered Jan 17 '17 at 18:16

yuewu008

23
4

how is the question :) I know what to do, I just dont know how to do it since I am not proficient at pandas – Fizi Jan 17 '17 at 18:17

Aziz Alto · Answer 4 · 2019-02-08T22:11:57.380

Here is another simple way:

concatenate the target "string" columns into a new single column, e.g.
```
df['new_col'] = df['c_name'] + ' ' + df['f_name']
```

search the new_col for the substring, e.g.

result = df[df['new_col'].str.contains('abc')]

delete new_col after it has completed its mission:
```
del results['new_col']
```

Here is an example:

>>> df= pd.DataFrame(row, columns=['c_name', 'f_name'])
>>> df
  c_name f_name
0   abcx   abcy
1    efg   abcz
2   higj     UK
>>> df['new_col'] = df['c_name'] + ' ' + df['f_name']
>>> results = df[df['new_col'].str.contains('abc')]
>>> del df['new_col'], results['new_col']
>>> results
  c_name f_name
0   abcx   abcy
1    efg   abcz

pandas search for substring over multiple columns

4 Answers4

Linked