Replace the content of all cells matching a condition

Question

How can I loop through an entire dataframe to remove data in cells that contain a particular string when the column names are unknown?

Here’s what I have so far:

for (i in colnames(df)){
   df2 = df[~df[i].str.contains('found')]

My data:

  Getting links from: https://www.bar.com/ Getting links from: https://www.boo.com/ Getting links from: https://www.foo.com/
0           ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/
1             ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/
2         ├─BROKEN─ http://www.broken.com/     2 links found. 0 excluded. 0 broken.         ├─BROKEN─ http://www.broken.com/
3                                      NaN                                      NaN            ├───OK─── http://www.set.com/
4                                      NaN                                      NaN            ├───OK─── http://www.one.com/

How can I delete the entire contents of a cell if it contains a string, for instance, 'found'? I want to delete everything in the cell (including before and after the string.)

Can you be more specific regarding _delete the entire contents_ , do you want it to contain a None NaN or perhaps an empty string ? — Oleg, Dec 31 '18 at 11:19

YOLO · Accepted Answer · 2018-12-31T11:37:03.427

4

You can use applymap here:

import re    

# just make sure all values are string type
df = df.astype(str)    

# remove the text which has word found
df.applymap(lambda x: re.sub('.*found.*','',x ))

                                          0
0  Getting links from: https://www.boo.com/
1            ├───OK─── http://www.this.com/
2              ├───OK─── http://www.is.com/
3                                          
4                                       nan
5                                       nan

edited Dec 31 '18 at 11:37

answered Dec 31 '18 at 11:11

YOLO

20,181
5
20
40

this is crisp. :) +1 – anky Dec 31 '18 at 11:16
@YOLO thank you for the suggestion! I'm getting an error: "TypeError: ('expected string or buffer', u'occured at index Getting links from: https://www.bar.com/') Any insight? – lane Dec 31 '18 at 11:32
@lane how does the header look like? is that the header ? – YOLO Dec 31 '18 at 11:34
@lane when you read the data, make sure you do `header=None`, i edited a bit in the answer. – YOLO Dec 31 '18 at 11:36

Karn Kumar · Answer 2 · 2018-12-31T12:28:30.460

As you are looking for a string/value to check and take the action against the whole DataFrame hence DataFrame.replace method fits the caveats here ..

Example DataFrame:

>>> df
      a
0  foo1
1  foo2
2   bar
3   bar
4   bar

Replacing bar with blank or you can replace it with NaN Value if desired:

>>> df.replace("bar", "", regex=True)
      a
0  foo1
1  foo2
2
3
4

Or replace bar with NaN

>>> df.replace("bar", np.nan, regex=True)
 # df.replace("bar", np.nan, regex=True, inplace=True)
      a
0  foo1
1  foo2
2   NaN
3   NaN
4   NaN

You can choose to have inplace=True if you like to replace to take place into the actual dataframe:

Simulated the example as given:

>>> df
                                                                                                                         col1
0  Getting links from: https://www.bar.com/ Getting links from: https://www.boo.com/ Getting links from: https://www.foo.com/
1            ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/
2              ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/
3          ├─BROKEN─ http://www.broken.com/     2 links found. 0 excluded. 0 broken.         ├─BROKEN─ http://www.broken.com/
4                                       NaN                                      NaN            ├───OK─── http://www.set.com/
5                                       NaN                                      NaN            ├───OK─── http://www.one.com/

Result with str.contains:

>>> df[~df["col1"].str.contains("found")]
                                                                                                                         col1
0  Getting links from: https://www.bar.com/ Getting links from: https://www.boo.com/ Getting links from: https://www.foo.com/
1            ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/           ├───OK─── http://www.this.com/
2              ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/             ├───OK─── http://www.is.com/
4                                       NaN                                      NaN            ├───OK─── http://www.set.com/
5                                       NaN                                      NaN            ├───OK─── http://www.one.com/

OR as Stated make sure values are string type to apply the operation

>>> df[~df["col1"].astype(str).str.contains("found")]

thanks for the suggestion! After running that line, the data is unchanged. I added my data to my post for you to look at. — lane, Dec 31 '18 at 11:03
As the documentation states it's using `re.sub` under the hood, yet in a more convenient interface in my opinion. @lane have you added `inplace=True` ? or `df = df.replace("bar", "", regex=True)` — Oleg, Dec 31 '18 at 11:35
Also, I think the string should be replaced with the regex as mentioned in @YOLO's answer `.*found.*` in order to replace the entire str and not just the exact part that matched — Oleg, Dec 31 '18 at 11:40
Sorry, was awy from the system, @Oleg rightly said replace method i placed in my answer before the post was re-updated, However, Yolo's answer should do the Job for the updated post. — Karn Kumar, Dec 31 '18 at 11:49
Even `df[~df["col1"].str.contains("found")]` is working fine for me . — Karn Kumar, Dec 31 '18 at 11:50

score 0 · Answer 3 · answered Dec 31 '18 at 04:08

0

You can replace with nan with df.replace({'test':np.nan}) and that should replace all instances

answered Dec 31 '18 at 04:08

RyanL

156
1
5

Replace the content of all cells matching a condition

3 Answers3