pandas
borrows its dtypes from numpy
. For demonstration of this see the following:
import pandas as pd
df = pd.DataFrame({'A': [1,'C',2.]})
df['A'].dtype
>>> dtype('O')
type(df['A'].dtype)
>>> numpy.dtype
You can find the list of valid numpy.dtypes
in the documentation:
'?' boolean
'b' (signed) byte
'B' unsigned byte
'i' (signed) integer
'u' unsigned integer
'f' floating-point
'c' complex-floating point
'm' timedelta
'M' datetime
'O' (Python) objects
'S', 'a' zero-terminated bytes (not recommended)
'U' Unicode string
'V' raw data (void)
pandas
should support these types. Using the astype
method of a pandas.Series
object with any of the above options as the input argument will result in pandas
trying to convert the Series
to that type (or at the very least falling back to object
type); 'u'
is the only one that I see pandas
not understanding at all:
df['A'].astype('u')
>>> TypeError: data type "u" not understood
This is a numpy
error that results because the 'u'
needs to be followed by a number specifying the number of bytes per item in (which needs to be valid):
import numpy as np
np.dtype('u')
>>> TypeError: data type "u" not understood
np.dtype('u1')
>>> dtype('uint8')
np.dtype('u2')
>>> dtype('uint16')
np.dtype('u4')
>>> dtype('uint32')
np.dtype('u8')
>>> dtype('uint64')
# testing another invalid argument
np.dtype('u3')
>>> TypeError: data type "u3" not understood
To summarise, the astype
methods of pandas
objects will try and do something sensible with any argument that is valid for numpy.dtype
. Note that numpy.dtype('f')
is the same as numpy.dtype('float32')
and numpy.dtype('f8')
is the same as numpy.dtype('float64')
etc. Same goes for passing the arguments to pandas
astype
methods.
To locate the respective data type classes in NumPy, the Pandas docs recommends this:
def subdtypes(dtype):
subs = dtype.__subclasses__()
if not subs:
return dtype
return [dtype, [subdtypes(dt) for dt in subs]]
subdtypes(np.generic)
Output:
[numpy.generic,
[[numpy.number,
[[numpy.integer,
[[numpy.signedinteger,
[numpy.int8,
numpy.int16,
numpy.int32,
numpy.int64,
numpy.int64,
numpy.timedelta64]],
[numpy.unsignedinteger,
[numpy.uint8,
numpy.uint16,
numpy.uint32,
numpy.uint64,
numpy.uint64]]]],
[numpy.inexact,
[[numpy.floating,
[numpy.float16, numpy.float32, numpy.float64, numpy.float128]],
[numpy.complexfloating,
[numpy.complex64, numpy.complex128, numpy.complex256]]]]]],
[numpy.flexible,
[[numpy.character, [numpy.bytes_, numpy.str_]],
[numpy.void, [numpy.record]]]],
numpy.bool_,
numpy.datetime64,
numpy.object_]]
Pandas accepts these classes as valid types. For example, dtype={'A': np.float}
.
NumPy docs contain more details and a chart:
