3

Let's say I have the following Excel file to be read:

enter image description here

What I want is a simple solution (preferrably one-line) that can read the excel so that the dates are converted to str (or at least int), and the blank values are to nan or nat or whatever can be detected by pd.isnull.

If I use df = pd.read_excel(file_path), what I get is

df
Out[8]: 
              001002.XY  600123.AB  123456.YZ   555555.GO
ipo_date     20100203.0   20150605        NaN  20090501.0
delist_date         NaN   20170801        NaN         NaN

So pandas recognised blank cells as NaN, which is fine, but the pet peeve is that all the other values are forced to float64, even if they are intended to be just str or ints. (edit: it seems that if a column, e.g. the column [1], has no nans, then the other values won't be forced to float. However, in my case most columns have delist_date blank, since most stocks have an ipo date but are not delisted yet.)

For what I know though, I tried the dtype=str keyword arg, and it gives me

df
Out[10]: 
            001002.XY 600123.AB 123456.YZ 555555.GO
ipo_date     20100203  20150605       nan  20090501
delist_date       nan  20170801       nan       nan

Looks good? True, the dates are now str, but one thing ridiculous is that the nans now become literal strings! E.g.

df.iloc[1, 0]
Out[12]: 
'nan'

which would make me have to add something weird like df.replace later on.

I didn't try the converters because it would require specifying datatype column by column, and the actual excel file I'm working with is a very long spreadsheet (3k columns approx). I don't want to transpose the spreadsheet in excel itself either.

Could anybody help? Thanks in advance.

Vim
  • 1,436
  • 2
  • 19
  • 32
  • I've found similar unexpected behaviour. I don't know if the excellent answer from Evert to my question [here](https://stackoverflow.com/a/45814200/4799172) gives any insight. – roganjosh Nov 11 '17 at 06:23
  • @roganjosh thanks Im gonna check it out. – Vim Nov 11 '17 at 06:28
  • It definitely is a tangent but I do find the datatype conversion odd (from csv in my case) so perhaps it gives you another angle. – roganjosh Nov 11 '17 at 06:29
  • 1
    Can you try dtype=object ? – amanbirs Nov 11 '17 at 07:34
  • @amanbirs yes it works! It reads all dates as ints while leaving blanks as nan. – Vim Nov 11 '17 at 07:37
  • @roganjosh please see the comment by amanbirs. Setting `dtype=object` works in my case anyway even though I don't understand the deep reasons behind it. Hopefully it can work in your case too. – Vim Nov 11 '17 at 07:43
  • Note that setting dtype as object means you can't vectorize any operations. It kinda defeats the purpose of pandas. – roganjosh Nov 11 '17 at 09:21
  • @roganjosh oh... didn't try it. Perhaps you're right. – Vim Nov 11 '17 at 09:25
  • Implicit conversion to a particular type was my question and I thought it was related to yours. If you cast as `object` then you have no hope of vectorized calculations. But, it could be a step to getting your input into the right format to vectorize after that. – roganjosh Nov 11 '17 at 09:29
  • @roganjosh actually after setting dtype=object it will convert dates to ints and blanks to NaN. – Vim Nov 11 '17 at 09:39

1 Answers1

3

Use dtype=object as the parameter.

Great explanation here: pandas distinction between str and object types

amanbirs
  • 1,078
  • 6
  • 11