1

There is a slight variant to this question Check if column value is in other columns in pandas

I have a dataframe called test

name_0        name_1    overall_name
Asda          Nan       Tesco
Asda          Nan       ASDA
LIDL 1        Asda      Lidl
AAA           Asda      ASDA
AAA           Asda      ASDA
Sainsbury     Nan       Lidl

How do I check whether test.overall_name is in any of the other columns ['name_0', 'name_1' etc] ignoring the size of characters (lowercase/uppercase) and any special characters.

So my ideal dataframe should look like:

name_0        name_1    overall_name   namematch 
Asda          Nan       Tesco          no match 
Asda          Nan       ASDA           match
LIDL 1        Asda      Lidl           match
AAA           Asda      ASDA           match
AAA           Asda      ASDA           match
Sainsbury     Nan       Lidl           no match
Deluxe88888
  • 113
  • 6

3 Answers3

1

Check this out:

This method converts and compares the values:

import pandas as pd 
import re

def match (first, second, overall):
    f = re.sub(r"[^a-zA-Z]"," ", first.lower()).strip()
    s = re.sub(r"[^a-zA-Z]"," ", second.lower()).strip()
    o = re.sub(r"[^a-zA-Z]"," ", overal.lower()).strip()
    if f == o:
        return 1
    elif s == o:
        return 1
    else:
        return 0

This line of code adds match column and applys the function to each row:

df['match'] = df.apply(lambda x: match(x['name_0'],x['name_1'],x['overall_name']),axis=1)

the result is something like this:

    name_0  name_1  overall_name    match
  0 Asda    Nan     Tesco             0
  1 Asda    Nan     ASDA              1
  2 LIDL 1  Asda    Lidl              1
  3 AAA     Asda    ASDA              1
  4 AAA     Asda    ASDA              1
  5 Sainsbury   Nan Lidl              0

let me know if it works for you.

Hamid
  • 612
  • 1
  • 8
  • 20
0

IICU

Make df have a common case. Then use boolean indexing combined with np.where to check and attribute

Using boolean indexing combined with np.where check and attribute

Dataframe Used

enter image description here

df["namematch "] = np.where((df.drop("overall_name", 1).apply(lambda x:x.str.lower())).isin(df["overall_name"].str.lower()).any(1),'match','nomatch')

#Make df have a common case. In this situation made it lower case
#df=df.apply(lambda x:x.str.lower())
# df["namematch "] = np.where(df.drop("overall_name", 1).isin(df["overall_name"]).any(1),'match','nomatch')

Outcome

enter image description here

wwnde
  • 26,119
  • 6
  • 18
  • 32
  • Hi, thanks for the answer. 2 questions; I get an error: AttributeError: Can only use .str accessor with string values!. Also, what if I wanted to check if part of a string in overall_name is in any other columns. For example, an overall_name could be As da is great. In X column it would be asda @wwnde – Deluxe88888 May 11 '20 at 11:47
  • Sorry @nic, I have been on urgent project.Why not set the field as string then? Happy to help – wwnde May 12 '20 at 04:27
0

Re-create the sample data frame:

df=pd.DataFrame({'name_0':['Asda','AS-DA','Asda','LIDL1','AAA','AAA','Sainsbury'],
                 'name_1':[np.nan,np.nan,'Asda','As da','Asda','Asda',np.nan],
                 'overall_name':['Tesco','ASDA','Lidl1','ASDA','ASDA','Lid1','As da']})

Convert float to string:

df=df.fillna('nan')

Remove special characters: '-' and ' ', note: need to import 'regex' lib

import re

df = df.applymap(lambda x: re.sub(r'-','', x))
df=df.applymap(lambda x: re.sub(r' ','',x))

Create a list:

name_0=df['name_0'].tolist()
name_1=df['name_1'].tolist()
name_concat=name_0+name_1

Obtain results:

df['namematch']=df['overall_name'].str.lower().isin([x.lower() for x in name_concat])
df['namematch']=np.where(df['namematch']==True,'match','nomatch')
Nev1111
  • 1,039
  • 9
  • 13
  • Thanks for this, what if there were special characters and white space involved such as AS-DA and in overall_name it was As da? @Nev1111 – Deluxe88888 May 11 '20 at 15:52
  • I changed the sample data frame to include examples with special characters. Please see the edited answer. Hope it works for you! – Nev1111 May 11 '20 at 17:45