3

I have a df such that

       c_name  f_name 
0      abc     abc12  
1      xyz     abc1  
2      mnq     mnq2

The goal is to find a substring across the two columns an know which column it belongs to. Preference should be to c_name, as in if the substring is in both the columns then c_name gets precedence For eg: if I search for abc in the above dataframe I should somehow get row 0 abc for c_name and row 1 abc1 for f_name.

To solve this I started with df[df['c_name'].str.contains('abc', case=False)] which will give me the results for c_name. The question now is to how to exclude the rows where I already have the results from performing the same operation on f_name. Any help is greatly appreciated!

Fizi
  • 1,749
  • 4
  • 29
  • 55
  • Does this answer your question? [pandas dataframe str.contains() AND operation](https://stackoverflow.com/questions/37011734/pandas-dataframe-str-contains-and-operation) – Abu Shoeb Apr 26 '21 at 18:25

4 Answers4

2
import pandas as pd
row  =[['abcx','abcy'],
       ['efg','abcz'],
       ['higj','UK']]
df= pd.DataFrame(row)
df.columns = ['c_name', 'f_name']

print df[df['c_name'].str.contains('abc', case=False)]

delta_df =df[~df['c_name'].str.contains('abc', case=False)]
print delta_df[delta_df['f_name'].str.contains('abc', case=False)]

output

  c_name f_name
0   abcx   abcy
  c_name f_name
1    efg   abcz
Shijo
  • 9,313
  • 3
  • 19
  • 31
2
  • stack into a series
  • str.contains to get truth value of sub string
  • unstack to get back dataframe
  • subset results to ensure at least one match
  • idxmax(1) gets first True in columns

def find_subtext(df, txt):
    contains = df.stack().str.contains(txt).unstack()
    return contains[contains.any(1)].idxmax(1)

find_subtext(df, 'abc')

0    c_name
1    f_name
dtype: object

df.assign(abc=find_subtext(df, 'abc'))

  c_name f_name     abc
0    abc  abc12  c_name
1    xyz   abc1  f_name
2    mnq   mnq2     NaN
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • I can always rely on you to come through with a response :) Thank you very much. Its an interesting approach – Fizi Jan 18 '17 at 23:28
0

Mark your first search result as something like 2. This will be overridden(by 1) if the new search result meets.

yuewu008
  • 23
  • 4
  • how is the question :) I know what to do, I just dont know how to do it since I am not proficient at pandas – Fizi Jan 17 '17 at 18:17
0

Here is another simple way:

  1. concatenate the target "string" columns into a new single column, e.g.

    df['new_col'] = df['c_name'] + ' ' + df['f_name']
    
  2. search the new_col for the substring, e.g.

    result = df[df['new_col'].str.contains('abc')]
    
  3. delete new_col after it has completed its mission:

    del results['new_col']
    


Here is an example:
>>> df= pd.DataFrame(row, columns=['c_name', 'f_name'])
>>> df
  c_name f_name
0   abcx   abcy
1    efg   abcz
2   higj     UK
>>> df['new_col'] = df['c_name'] + ' ' + df['f_name']
>>> results = df[df['new_col'].str.contains('abc')]
>>> del df['new_col'], results['new_col']
>>> results
  c_name f_name
0   abcx   abcy
1    efg   abcz
Aziz Alto
  • 19,057
  • 5
  • 77
  • 60