pd.DataFrame.select_dtypes() inculdes timedelta dtype

Question

Why is it expected behavior that this test code:

test = pd.DataFrame({'bool' :[False, True], 'int':[-1,2], 'float': [-2.5, 3.4],
                     'compl':np.array([1-1j, 5]),
                     'dt'   :[pd.Timestamp('2013-01-02'), pd.Timestamp('2016-10-20')],
                     'td'   :[pd.Timestamp('2012-03-02')- pd.Timestamp('2016-10-20'),
                              pd.Timestamp('2010-07-12')- pd.Timestamp('2000-11-10')]})
test.dtypes
test.select_dtypes(np.number)

Produces DataFrame with TimeDelta column included?

>>> bool                bool
>>> int                int64
>>> float            float64
>>> compl         complex128
>>> dt        datetime64[ns]
>>> td       timedelta64[ns]
>>> dtype: object

>>>     int     float   compl   td
>>> 0    -1     -2.5    (1-1j)  -1693 days
>>> 1     2      3.4    (5+0j)   3531 days

EDIT:

For someone (including me) the following may be helpful:

I've also found the reason why this behavior was unexpected for me at first. The reason was another way to check if dtype of pd.DataFrame is numeric. Namely via pd.api.types.is_numeric_dtype:

for col in test.columns:
    if pd.api.types.is_numeric_dtype(test[col]):
        print (test[col].dtype)

>>> bool
>>> int64
>>> float64
>>> complex128

Which produces more 'human-desired' output.

score 4 · Accepted Answer · answered May 31 '19 at 00:58

Because that's how it has been implemented:

np.issubdtype(np.timedelta64, np.number)
# True

More specifically,

np.issubdtype(np.timedelta64, np.integer)
# True

timedelta and datetime dtypes in numpy are internally represented by integer. This makes it easy to represent in memory, and makes arithmetic on datetimes fast.

If you want to exclude these types from your checks, you can specify an exclude argument:

test.select_dtypes(include=['number'], exclude=['datetime', 'timedelta'])

   int  float   compl
0   -1   -2.5  (1-1j)
1    2    3.4  (5+0j)

score 1 · Answer 2 · answered May 31 '19 at 01:02

Since numpy.timedelta is belong to numpy.number, if you only want the number numeric columns return

num= ['int16', 'int32', 'int64', 'float16', 'float32', 'float64','complex128']
test.select_dtypes(include=num)
Out[715]: 
    compl  float  int
0  (1-1j)   -2.5   -1
1  (5+0j)    3.4    2

pd.DataFrame.select_dtypes() inculdes timedelta dtype

EDIT:

2 Answers2

Linked