For a dataframe named rawfile
, we have a ?
in the "workclass" column. We use the code rawfile.replace("?", "NaN")
to replace all the ?
with a NaN
. But what if there are other unnormal values other than ?
, like -
. @
. How can I detect them and replace them?
Asked
Active
Viewed 124 times
0
-
Welcome to SO, thanks for showing your efforts in form of code. Please do add your samples in form of text so that we could test our solutions it, thank you. – RavinderSingh13 May 17 '21 at 13:21
2 Answers
0
Use regex in the replace method. Set the value to be what you want, but in the code below it is set to be np.NaN
but you could use a string of NaN
if desired.
df['workclass'] = df['workclass'].replace(to_replace = '\@|\?' , value=np.NaN, regex = True)

Joe Thor
- 1,164
- 1
- 11
- 19
-
what if there are many any unknown unnormal values except \@|\?. Are there a function to summary it all? – foyeyefo May 17 '21 at 14:18
-
You could use regex to exclude values that do not fit the pattern. Here is an answer that goes into the details: https://stackoverflow.com/a/2930209/4352317 – Joe Thor May 17 '21 at 14:21
0
This depends on the types of normal and not normal values you expect. If there is not some prepared set of answers you should use regex. On the other hand, if there are only answers specified in advance then you can use construction
df.loc[~df["workclass"].isin(allowed_answers), "workclass"] = np.NaN
I also recommend using np.NaN
instead of "NaN"
as np.NaN
is not read as string.

VKRW
- 1
- 1