0

Im trying creating a function in python to replace any forms of NaN to NaN.

import pandas as pd
import numpy as np

data=pd.read_csv("diabetes.csv")

def proc_all_NaN(data):
    nan_sym=["_","-","?","","na","n/a"]
    for i in nan_sym:
        data.replace(i,np.nan)

proc_all_NaN(data)

I expect the output of my fuction to be a dataframe with NaN where the dataframe had all these types of NaN: "_","-","?","","na","n/a".

The output when i call the function is just my data without any change.

Could you help me, because i dont get my coding mistake

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
Hapanas
  • 3
  • 1

1 Answers1

1

You can define the type of null values when you read the file using pd.read_csv(). Per the docs:

na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

In your case, you can try:

data=pd.read_csv("diabetes.csv", na_values=["_","-","?","","na","n/a"])
realr
  • 3,652
  • 6
  • 23
  • 34