1

I was trying to remove b'NA' from default pandas na_values. They are defined in pandas._libs.parsers. I did it by importing the list and:

from pandas._libs.parsers import _NA_VALUES

disable_na_values = [b"NA"]
my_default_na_values = [
    item.decode("UTF-8") for item in _NA_VALUES if item not in disable_na_values
]

df = pd.read_excel(filepath, keep_default_na=False, na_values=my_default_na_values)

It works and now, after excel/csv is imported the cells with "NA" value are ignored (country code for Nambia).

However what I don't understand, why are this na_values as bytes? And where are they used?

Thank you.

KObb
  • 51
  • 6
  • Does this answer your question ? https://stackoverflow.com/questions/44624404/does-the-np-nan-in-numpy-array-occupy-memory – Movilla May 30 '22 at 14:48

1 Answers1

0

The reason you see a list of byte strings is because you are importing the wrong variable. You should use STR_NA_VALUES instead, which give you a set. From this set you can easily remove the items you don't want by subtracting. See below:

from pandas._libs.parsers import STR_NA_VALUES

disable_na_values = {"NA"}
my_default_na_values = STR_NA_VALUES - disable_na_values

adr
  • 1,731
  • 10
  • 18