python: better way to handle case sensitivities with df.replace

Question

I have the following code to replace text in my dataframe - dfMSR.

oldtxts = ['NA', 'na']
newtxt = 'N/A'
for oldtxt in oldtxts:
    if oldtxt in dfMSR.values: 
        dfMSR = dfMSR.replace(oldtxt, newtxt, regex=True)
    else: 
        print("\nNo {oldtxt} in Dataframe")

Is there a better way to replace all cases scenarios without spelling them out or changing the case of all text in the dataframe to upper or lower? In the above code if the user wrote 'Na', it wouldn't be replaced as I haven't included it in oldtxts.

edit: sample data and desired output added

dfMSR = pd.DataFrame({'A':['NA','na','O', '', 'N/A'],
          'B':['Anna','E','NA', 'Z', 'Na']})

desired output:

    A   B
0   N/A Anna
1   N/A E
2   O   N/A
3       Z
4   N/A N/A



Thanks

can you add a sample of your dataframe? – Umar.H Dec 30 '20 at 13:34 — Umar.H, Dec 30 '20 at 13:34

Arun Palanisamy · Accepted Answer · 2020-12-30T16:06:19.070

1

You can use the case parameter of str.replace since you have mentioned regex=True

dfMSR.apply(lambda x: x.astype(str).str.replace(r'\bna\b', 'N/A', regex=True,case=False))

Please note that it will not work if it is not Regex-based

Output:

import pandas as pd
dfMSR = pd.DataFrame({'A':['NA','na','O', '', 'N/A'],
          'B':['Anna','E','NA', 'Z', 'Na']})
dfMSR
     A     B
0   NA  Anna
1   na     E
2    O    NA
3          Z
4  N/A    Na
dfMSR.apply(lambda x: x.astype(str).str.replace(r'\bna\b', 'N/A', regex=True,case=False))
     A     B
0  N/A  Anna
1  N/A     E
2    O   N/A
3          Z
4  N/A   N/A

edited Dec 30 '20 at 16:06

answered Dec 30 '20 at 13:42

Arun Palanisamy

5,281
6
28
53

1

Arun, I'd read about setting case=False, but it didn't work (*TypeError: replace() got an unexpected keyword argument 'case'*). When I look at the .replace documentation here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html, case is not an option. Also, can you explain what you mean if by "if it is not Regex-based". Thanks – SModi Dec 30 '20 at 14:55
1

It's not `pandas.DataFrame.replace` it is [pandas.Series.str.replace](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html). By not regex-based I meant if `regex=False`, you cannot use this option. Please add sample data and expected result. @SModi – Arun Palanisamy Dec 30 '20 at 15:03
data + result added. To use series.str.replace do I need to do this one column at a time? – SModi Dec 30 '20 at 15:17
I have updated the answer. No,you dont need to do for one column at a time. It will replaced for all values.@SModi – Arun Palanisamy Dec 30 '20 at 15:41
Okay. Then you have to first convert it to string with `.astype(str)`. Updated. Please check now @SModi – Arun Palanisamy Dec 30 '20 at 15:50
unfortunately didn't work. no errors. 'na' didn't get replaced with 'N/A'. – SModi Dec 30 '20 at 16:04
For me it is working :) @SModi. I ran the same exact code. Updated my output as well – Arun Palanisamy Dec 30 '20 at 16:06
This worked, but I had to say dfMSR = dfMSR.apply... I noticed that it's replaced what was originally written into excel as 0 with nans. Will confirm that true 0s haven't been replaced before accepting answer. – SModi Dec 30 '20 at 16:21
1

No. `0`s will not be replaced with `NaN`s. We are just checking for `na`(_case insensitive_) and replacing the same. Even in your sample data here, there is one `0` and I could see it is not replaced with `NaN`. @SModi – Arun Palanisamy Dec 31 '20 at 05:01
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/226653/discussion-between-smodi-and-arun-palanisamy). – SModi Dec 31 '20 at 11:04

score 0 · Answer 2 · answered Dec 30 '20 at 13:32

0

You can chain str.lower() with .replace and also, you needn't test the if condition as it implicitly done in the replace:

dfMSR = dfMSR.apply(lambda x: x.str.lower()).replace(oldtxt, newtxt, regex=True)

In an example case it would look like this:

pd.DataFrame({'A':['NA','na','O'],
          'B':['X','E','NA']}).apply(lambda x: x.str.lower()).replace('na','N/A',regex=True)

answered Dec 30 '20 at 13:32

Celius Stingher

17,835
6
23
53

Celius, I adopted your code to: ```dfMSR.apply(lambda x: x.str.lower()).replace('na','N/A',regex=True) ```. If that's correct I got the error: *AttributeError: Can only use .str accessor with string values!*. I don't want to convert all columns to dtype string as there's a mix of string, boolean, floats and dates in the data. – SModi Dec 30 '20 at 15:03
Ah, alright, then if you could tell me the list of columns you'd like to apply I can fix your code to make it work :) – Celius Stingher Dec 30 '20 at 15:04
I will not have a list of columns. The data creator may add new columns and I essentially want to check the entire data set. – SModi Dec 30 '20 at 15:14
(Also agree, if statement is not needed. I'd inserted it to see if there were any occurrances of the oldtxt - as the data was partially cleaned in earlier stages, processed by someone else. Will be removing from final code.) – SModi Dec 30 '20 at 15:19

python: better way to handle case sensitivities with df.replace

2 Answers2

Linked