Let's say I have the following Excel file to be read:
What I want is a simple solution (preferrably one-line) that can read the excel so that the dates are converted to str
(or at least int
), and the blank values are to nan
or nat
or whatever can be detected by pd.isnull
.
If I use df = pd.read_excel(file_path)
, what I get is
df
Out[8]:
001002.XY 600123.AB 123456.YZ 555555.GO
ipo_date 20100203.0 20150605 NaN 20090501.0
delist_date NaN 20170801 NaN NaN
So pandas recognised blank cells as NaN
, which is fine, but the pet peeve is that all the other values are forced to float64
, even if they are intended to be just str
or int
s. (edit: it seems that if a column, e.g. the column [1]
, has no nan
s, then the other values won't be forced to float
. However, in my case most columns have delist_date
blank, since most stocks have an ipo date but are not delisted yet.)
For what I know though, I tried the dtype=str
keyword arg, and it gives me
df
Out[10]:
001002.XY 600123.AB 123456.YZ 555555.GO
ipo_date 20100203 20150605 nan 20090501
delist_date nan 20170801 nan nan
Looks good? True, the dates are now str
, but one thing ridiculous is that the nan
s now become literal strings! E.g.
df.iloc[1, 0]
Out[12]:
'nan'
which would make me have to add something weird like df.replace
later on.
I didn't try the converters
because it would require specifying datatype column by column, and the actual excel file I'm working with is a very long spreadsheet (3k columns approx). I don't want to transpose the spreadsheet in excel itself either.
Could anybody help? Thanks in advance.