4

I have to delete values in row of a dataframe if they contain a certain string. Problem is that row is very long and contains text.

Loop does not work and put index in a list and then use .drop on index does also not work.

column1
8
8
8
8 total       <-------- This must be deleted
8
8 
8 
8
8
...

Thanks

  • check `pandas` ,[`series.str.contains()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html) -->`series[~series.str.contains('total')]` – anky Apr 25 '19 at 09:50

2 Answers2

5

Suppose your dataframe is called df. Then use:

df_filtered = df[~df['column1'].str.contains('total')]

Explanation:

df['column1'].str.contains('total') will give you an array of the length of the dataframe column that is True whereever df['column1'] contains 'total'. With ~ you swap the True and False values of this array. And finally with df_filtered = df[...] you take only the lines, for which 'total' is not included.

markuscosinus
  • 2,248
  • 1
  • 8
  • 19
  • 1
    you can add 'case = False' to the statement if you do not care about case. You might also need to add 'na = False' ,just in case you have some nas in the column. df[~df['column1'].str.contains('total', case = False, na = False)] – Jorge Apr 25 '19 at 10:15
2

if i understood it correctly, i have an small example below where the dataFrame is called df and i want to remove the mixfruit to be searched and deleted.

>>> df
       name  num
0     apple    5
1    banana    3
2  mixfruit    5
3    carret    6

One way is as other mentioned can go with str.contains as follows ..

>>> df[~df.name.str.contains("mix")]
     name  num
0   apple    5
1  banana    3
3  carret    6

You can use isin as well, which will drop all rows containing string

>>> df[~df['name'].isin(['mixfruit'])]
     name  num
0   apple    5
1  banana    3
3  carret    6

However, you can achieve the same as follows...

>>> df[df['name'] != 'mixfruit']
     name  num
0   apple    5
1  banana    3
3  carret    6
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53