13

How can I filter rows which column contain another column? For example, if we have DT with two columns A, B, can we filter rows with B.contains(A)? Not just if B contains some A values from all A from DT, but just in one row.

A      B
'lol'  'lolec'
'ram'  'rambo'
'ki'   'pio'

Result:
A     B
'lol'  'lolec'
'ram'  'rambo'
Fabio Lamanna
  • 20,504
  • 24
  • 90
  • 122
wowbrowser search
  • 345
  • 1
  • 3
  • 11

2 Answers2

12

You can use boolean indexing with mask created by apply and in if need filter columns A and B per rows:

#if necessary strip ' in all values
df = df.apply(lambda x: x.str.strip("'"))
#df = df.applymap(lambda x: x.strip("'"))

print (df.apply(lambda x: x.A in x.B, axis=1))
0     True
1     True
2    False
dtype: bool

df = df[df.apply(lambda x: x.A in x.B, axis=1)]
print (df)
     A      B
0  lol  lolec
1  ram  rambo

Difference of solutions - input DataFrame is changed:

print (df)
     A      B
0  lol    pio
1  ram  rambo
2   ki  lolec

print (df[df.apply(lambda x: x.A in x.B, axis=1)])
     A      B
1  ram  rambo

print (df[df['B'].str.contains("|".join(df['A']))])
    A      B
1  ram  rambo
2   ki  lolec

for improve performance use list comprehension:

df = df[[a in b for a, b in zip(df.A, df.B)]]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks, this is usefull. I have little problem with encoding. Encoding was always big problem in python 2.7 for me. – wowbrowser search Jan 24 '17 at 14:01
  • 1
    Is possible switched to python 3? I switched 6 months ago and it was good decision. – jezrael Jan 24 '17 at 14:02
  • I think it's time:) – wowbrowser search Jan 24 '17 at 14:07
  • @jezrael How can I do this if 1st column is int64 and 2nd column is float? I get an error saying 'int' is not iterable when I run df[df.apply(lambda x: x.A in x.B, axis=1)]. – Sikander Feb 25 '22 at 02:05
  • I tried df1 = df[df['parent'].isin(df['child'])] as well as df1 = df[df['parent'].astype('int64').isin(df['child'])] but both do not filter out records that I expect to be filtered out, i.e., 'parent' values that are not present in 'child' column. – Sikander Feb 25 '22 at 02:37
  • 1
    @Sikander - Hard to know what is problem without data - both columns are integers? – jezrael Feb 25 '22 at 05:58
  • One column (child) is int64 and the other is float (Parent). I need to filter out any rows that do not have Parent in Child column. For example, if 123 is present in Parent column, say, in row #10 but not present in any rows of the Child column, then I need to delete this row. – Sikander Feb 25 '22 at 06:25
  • 1
    @Sikander - Is possible see data sample? – jezrael Feb 25 '22 at 06:28
  • @jezrael I just asked a question here: https://stackoverflow.com/questions/71273144/python-pandas-delete-rows-where-value-in-one-column-is-not-present-in-another-c – Sikander Feb 26 '22 at 00:45
7

You can use str.contains to match each of the substrings by using the regex | character which implies an OR selection from the contents of the other series:

df[df['B'].str.contains("|".join(df['A']))]

enter image description here

Nickil Maveli
  • 29,155
  • 8
  • 82
  • 85