Drop rows with sub-string value given

Question

Drop rowsfrom dataframe given the subsstring is present in particular col's row.

df:

Parent  Child   score
1stqw   Whoert      0.305125
tWowe   Tasert      0.308132
Worert  Picert      0.315145

substrings = [Wor,Tas]

Drop rows having the substrings.

Updated df:

 Parent Child   score
1stqw   Whoert      0.305125

thanks!!

score 3 · Accepted Answer · answered Oct 23 '18 at 09:05

You can concatenate and then use pd.Series.str.contains:

L = ['Wor', 'Tas']

df = df[~(df['Parent'] + df['Child']).str.contains('|'.join(L))]

print(df)

  Parent   Child     score
0  1stqw  Whoert  0.305125

For efficiency / performance, see Pandas filtering for multiple substrings in series.

score 2 · Answer 2 · answered Oct 23 '18 at 09:00

2

Use str.contains with apply in subset of DataFrame and then add any for test at least one True per row:

cols = ['Parent', 'Child']
mask = df[cols].apply(lambda x: x.str.contains('|'.join(substrings))).any(axis=1)

Or chain boolenam mask together by | (bitwise OR):

mask = (df['Parent'].str.contains('|'.join(substrings)) | 
        df['Child'].str.contains('|'.join(substrings)))

df = df[~mask]
print (df)
  Parent   Child     score
0  1stqw  Whoert  0.305125

answered Oct 23 '18 at 09:00

jezrael

822,522
95
1,334
1,252

Thanks mate!! and i got some repeated rows after doing this steps ,dont know why. – vijay athithya Oct 23 '18 at 09:18
@vijayathithya - Maybe some duplicates in data, try `df = df.drop_duplicates()` and if necessary test som columnsonly for dupes `df = df.drop_duplicates(subset=['Parent', 'Child'])` – jezrael Oct 23 '18 at 09:20

Drop rows with sub-string value given

2 Answers2