2

This is probably a trivial problem but I cannot see how to proceed:

I have a dataframe called data that is taken from an excel file:

data = pd.read_excel("C:\\Users\\firstname.lastname\\Desktop\\variable+".xlsx", sheet_name=variable2, na_values='', usecols="A,B", skiprows=range(2))

This works fine to load the file however in column B there is a string that is precisely equal to "NA", in the resulting dataframe this doesn't show as NA but just appears blank.

I am assuming that pandas is interpreting this NA as a blank and as such leaving it empty.

But this is a legitimate NA and I want it to say NA not blank.

I have tried replacing:

na_values='NA'

and also using regex to replace the column values that are blank with "NA" (not the best solution I admit but would work in this case as column B has only one NA that becomes blank.

df.replace(r'^\s+$', np.nan, regex=True)

What am I doing wrong?

tobsob
  • 602
  • 9
  • 22
user9940344
  • 574
  • 1
  • 8
  • 26

1 Answers1

2

According to the pandas documentation, try to add keep_default_na=False to the parameters of pd.read_excel

Aryerez
  • 3,417
  • 2
  • 9
  • 17
  • IT works for me. As keep_default_na=False will not touch the data although the value is "N/A". – Oscar Oct 31 '22 at 11:12