2

I am using Python 3.8 and Pandas 1.3. Here is some sample code:

    data_dc = {'Dates': ['10212021','11152021','01142022','02122022']}
    df1 = pd.DataFrame(data_dc)
    print(df1['Dates'].astype(int))

Results:

0    10212021
1    11152021
2     1142022
3     2122022
Name: Dates, dtype: int32

I specified a Python data type (int) as the argument of the astype method and expected a dtype of the Dates column to be int64. Instead, I got int32. Is this a bug or am I doing something wrong? This is easy to work around, but I like to make sure I understand what to expect from the software.

eshirvana
  • 23,227
  • 3
  • 22
  • 38
  • I get int64, so maybe it is something with your config?!! idk – eshirvana Jan 22 '22 at 17:28
  • AFAIK `dtype(int)` is the default integer type of `numpy`. This is determined by the size of the `long int` type of your system's c compiler. This is usually 32-bit on [windows 64-bit OS](https://stackoverflow.com/q/384502/14277722). – Michael Szczesny Jan 22 '22 at 17:30
  • OK. The documentation says you have to put numpy types in quotes but not the python types which arr float, int and str. Every example I see online that uses int gets a dtype of int64. – Bill Fredericks Jan 22 '22 at 17:45
  • I just tried the same code on Linux Mint running on Virtual Box and got the int64 result. Running on Windows 11, I get int32. Can it be a Windows issue? – Bill Fredericks Jan 22 '22 at 17:56
  • My Windows OS is 64 bit and I have confirmed that my Python is 64 bit as well. – Bill Fredericks Jan 22 '22 at 19:42

1 Answers1

1

Pandas uses numpy datatypes under the hood. From the numpy documentation,

The default NumPy behavior is to create arrays in either 32 or 64-bit signed integers (platform dependent and matches C int size) or double precision floating point numbers, int32/int64 and float, respectively. If you expect your integer arrays to be a specific type, then you need to specify the dtype while you create the array.

It is not a bug and you should be specifying dtypes if you have a specific use or want to be platform agnostic. To rephrase your question, what is np.dtype(int) on my platform?

On windows, as some of the comments suggest, it appears to be a C signed long (32 bits). You can even get numpy to throw an overflow error to confirm this.

>>> import numpy as np
>>> np.array([2_147_483_648], dtype=int) 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long
tgpz
  • 161
  • 10
  • OK. I tried that code and got the result you showed. I was under the impression that, if you specifies Python int, you would always get int64. Thanks for your detailed answer. That was a big help in understanding more about Python. – Bill Fredericks Jan 23 '22 at 01:22