I am trying to read in a large number of .xls and .xlsx files with predominantly numeric data into python using pd.read_excel. However, the files use em-dash for missing values. I am trying to get Python to replace all these em-dashes as nans. I can't seem to find a way to get Python to even recognize the character, let alone replace it. I tried the following which did not work
df['var'].apply(lambda x: re.sub(u'\2014','',x))
I also tried simply
df['var'].astype('float')
What would be the best way to get all the em-dashs in a dataframe to convert to nans, while keeping the numeric data as floats?