2

Drop rowsfrom dataframe given the subsstring is present in particular col's row.

df:

Parent  Child   score
1stqw   Whoert      0.305125
tWowe   Tasert      0.308132
Worert  Picert      0.315145

substrings = [Wor,Tas]

Drop rows having the substrings.

Updated df:

 Parent Child   score
1stqw   Whoert      0.305125

thanks!!

vijay athithya
  • 1,529
  • 1
  • 10
  • 16

2 Answers2

3

You can concatenate and then use pd.Series.str.contains:

L = ['Wor', 'Tas']

df = df[~(df['Parent'] + df['Child']).str.contains('|'.join(L))]

print(df)

  Parent   Child     score
0  1stqw  Whoert  0.305125

For efficiency / performance, see Pandas filtering for multiple substrings in series.

jpp
  • 159,742
  • 34
  • 281
  • 339
2

Use str.contains with apply in subset of DataFrame and then add any for test at least one True per row:

cols = ['Parent', 'Child']
mask = df[cols].apply(lambda x: x.str.contains('|'.join(substrings))).any(axis=1)

Or chain boolenam mask together by | (bitwise OR):

mask = (df['Parent'].str.contains('|'.join(substrings)) | 
        df['Child'].str.contains('|'.join(substrings)))

df = df[~mask]
print (df)
  Parent   Child     score
0  1stqw  Whoert  0.305125
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks mate!! and i got some repeated rows after doing this steps ,dont know why. – vijay athithya Oct 23 '18 at 09:18
  • @vijayathithya - Maybe some duplicates in data, try `df = df.drop_duplicates()` and if necessary test som columnsonly for dupes `df = df.drop_duplicates(subset=['Parent', 'Child'])` – jezrael Oct 23 '18 at 09:20