224

Let's say df is a pandas DataFrame. I would like to find all columns of numeric type. Something like:

isNumeric = is_numeric(df)
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Hanan Shteingart
  • 8,480
  • 10
  • 53
  • 66
  • You should specify whether a column that has `dtype` being `object`, but all elements being numeric, counts as numeric or not. If no, take Hanan's answer, as it is also faster. Otherwise, take mine. – FooBar Jul 30 '14 at 15:07
  • 1
    What happens if you simply try df.describe().columns. Then assign it to a variable. – coldy Feb 06 '19 at 09:51
  • Related: [Get list of pandas dataframe columns based on data type](https://stackoverflow.com/questions/22470690/get-list-of-pandas-dataframe-columns-based-on-data-type). Then you just need to list the integer and float types to `df.select_dtypes(include=[...])`. – smci Dec 22 '21 at 08:50

15 Answers15

274

You could use select_dtypes method of DataFrame. It includes two parameters include and exclude. So isNumeric would look like:

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']

newdf = df.select_dtypes(include=numerics)
Anand
  • 2,765
  • 1
  • 13
  • 3
  • 175
    You could use df.select_dtypes(include=[np.number]) if you don't need to specify a 'numerics' list – KieranPC Mar 19 '15 at 16:38
  • 48
    Building on the tip in the previous comment (+1), you could just use `list(df.select_dtypes(include=[np.number]).columns.values)` to get a list of names of the numeric columns – user799188 Mar 24 '16 at 23:48
  • 1
    Conversely, a classical filter with string matching is even more performant (measured): `list(filter(lambda x: 'float' not in str(df.dtypes[x]) and 'int' not in str(df.dtypes[x]), df.columns))` – philvec Oct 14 '21 at 07:54
  • 4
    One-liner if you want column names `list(df.select_dtypes('number'))` (from pandas v1.0.0) – Cristobal Oct 19 '21 at 10:04
  • 8
    This answer looks obsolete. In 2022, "To select all numeric types, use `np.number` or `'number'`", from https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.select_dtypes.html – PatrickT Jan 09 '22 at 04:25
  • 1
    This did not work for me. I was losing the int64 columns. The solution that worked was df.select_dtypes(include=[np.number]) – Sergio Polimante Apr 29 '22 at 22:20
172

Simple one-line answer to create a new dataframe with only numeric columns:

df.select_dtypes(include=np.number)

If you want the names of numeric columns:

df.select_dtypes(include=np.number).columns.tolist()

Complete code:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': range(7, 10),
                   'B': np.random.rand(3),
                   'C': ['foo','bar','baz'],
                   'D': ['who','what','when']})
df
#    A         B    C     D
# 0  7  0.704021  foo   who
# 1  8  0.264025  bar  what
# 2  9  0.230671  baz  when

df_numerics_only = df.select_dtypes(include=np.number)
df_numerics_only
#    A         B
# 0  7  0.704021
# 1  8  0.264025
# 2  9  0.230671

colnames_numerics_only = df.select_dtypes(include=np.number).columns.tolist()
colnames_numerics_only
# ['A', 'B']
yatu
  • 86,083
  • 12
  • 84
  • 139
stackoverflowuser2010
  • 38,621
  • 48
  • 169
  • 217
93

You can use the undocumented function _get_numeric_data() to filter only numeric columns:

df._get_numeric_data()

Example:

In [32]: data
Out[32]:
   A  B
0  1  s
1  2  s
2  3  s
3  4  s

In [33]: data._get_numeric_data()
Out[33]:
   A
0  1
1  2
2  3
3  4

Note that this is a "private method" (i.e., an implementation detail) and is subject to change or total removal in the future. Use with caution.

cs95
  • 379,657
  • 97
  • 704
  • 746
Kathirmani Sukumar
  • 10,445
  • 5
  • 33
  • 34
  • 2
    Super handy; is this documented anywhere? Concerned about it disappearing in future versions and/or instability, as [its prefix underscore indicates that it's meant to be private.](https://stackoverflow.com/a/1301369/588437) – ijoseph Apr 10 '18 at 18:23
  • 6
    No, this isn't documented anywhere. The implementation is [here](https://github.com/pandas-dev/pandas/blob/870b6a6d6415c76d051b287adcb180ac3020b6e8/pandas/core/generic.py#L3538-L3540), however, like @ijoseph mentioned I would be wary of using methods that begin with underscores as they are little more than implementation details. Use literally ANY other answer besides this. – cs95 May 20 '19 at 00:24
  • 1
    Exactly. As a best practice I try to use and convert to as many numpy methods as possible. This is due to pandas dynamism. The API changes frequently. For undocumented methods it's just plain reckless, no matter how useful it is. – mik Aug 21 '19 at 12:42
74
df.select_dtypes(exclude = ['object'])

Update:

df.select_dtypes(include= np.number)

or with new version of panda

 df.select_dtypes('number')
Antoine Dubuis
  • 4,974
  • 1
  • 15
  • 29
BENY
  • 317,841
  • 20
  • 164
  • 234
  • 8
    datetime columns are a different type `datetime` thay are not numeric types – Jeru Luke Oct 14 '17 at 12:48
  • I really liked the above: `df.select_dtypes(include= np.number)` for numeric columns and for non numeric cols: `df.select_dtypes(exclude= np.number)` ... it is helpful to me.. – LeMarque Nov 03 '22 at 04:21
  • In my case I also needed to exclude timedeltas: select_dtypes(include='number', exclude='timedelta'). Perhaps this is obvious to others! Leaving here just in case it is not! – Jake Drew May 29 '23 at 15:20
45

Simple one-liner:

df.select_dtypes('number').columns
nimbous
  • 1,507
  • 9
  • 12
10

Following codes will return list of names of the numeric columns of a data set.

cnames=list(marketing_train.select_dtypes(exclude=['object']).columns)

here marketing_train is my data set and select_dtypes() is function to select data types using exclude and include arguments and columns is used to fetch the column name of data set output of above code will be following:

['custAge',
     'campaign',
     'pdays',
     'previous',
     'emp.var.rate',
     'cons.price.idx',
     'cons.conf.idx',
     'euribor3m',
     'nr.employed',
     'pmonths',
     'pastEmail']
    
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Hukmaram
  • 523
  • 5
  • 11
5

This is another simple code for finding numeric column in pandas data frame,

numeric_clmns = df.dtypes[df.dtypes != "object"].index 
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
Anvesh_vs
  • 376
  • 4
  • 7
4

We can include and exclude data types as per the requirement as below:

train.select_dtypes(include=None, exclude=None)
train.select_dtypes(include='number') #will include all the numeric types

Referred from Jupyter Notebook.

To select all numeric types, use np.number or 'number'

  • To select strings you must use the object dtype but note that this will return all object dtype columns

  • See the NumPy dtype hierarchy <http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html>__

  • To select datetimes, use np.datetime64, 'datetime' or 'datetime64'

  • To select timedeltas, use np.timedelta64, 'timedelta' or 'timedelta64'

  • To select Pandas categorical dtypes, use 'category'

  • To select Pandas datetimetz dtypes, use 'datetimetz' (new in 0.20.0) or ``'datetime64[ns, tz]'

4

Although this is old subject,

but i think the following formula is easier than all other comments

df[df.describe().columns]

As the function describe() only works for numeric columns, the column of the output will only be numeric.

Adam
  • 840
  • 6
  • 24
  • 2
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Feb 19 '22 at 21:51
  • I can confirm this works, so thanks for that, but I also would love an explanation of WHY it works. It is not obvious to me. – Adam Mar 20 '22 at 07:11
  • 1
    @Adam As the function `describe()` only works for numeric columns, so the column of the output will only be numeric – ahmed sabri Mar 21 '22 at 08:44
  • Doesn't work in 2022. – Amarpreet Singh May 19 '22 at 06:03
1

Please see the below code:

if(dataset.select_dtypes(include=[np.number]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.number]).describe())
if(dataset.select_dtypes(include=[np.object]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.object]).describe())

This way you can check whether the value are numeric such as float and int or the srting values. the second if statement is used for checking the string values which is referred by the object.

mickey
  • 11
  • 2
0

Adapting this answer, you could do

df.ix[:,df.applymap(np.isreal).all(axis=0)]

Here, np.applymap(np.isreal) shows whether every cell in the data frame is numeric, and .axis(all=0) checks if all values in a column are True and returns a series of Booleans that can be used to index the desired columns.

Community
  • 1
  • 1
Garrett
  • 4,007
  • 2
  • 41
  • 59
0

A lot of the posted answers are inefficient. These answers either return/select a subset of the original dataframe (a needless copy) or perform needless computational statistics in the case of describe().

To just get the column names that are numeric, one can use a conditional list comprehension with the pd.api.types.is_numeric_dtype function:

numeric_cols = [col for col in df if pd.api.types.is_numeric_dtype(df[col])]

I'm not sure when this function was introduced.

Alexander
  • 105,104
  • 32
  • 201
  • 196
0

@Kathiramani Sukumar's answer df._get_numeric_data() takes the cake.

xdf = pd.DataFrame({'Numeric':[20,10,np.nan],'String':['foo','bar','daa'],'Date':[datetime(2023,1,1,0,0,0),datetime(2023,1,2,0,0,0),np.nan]})
xdf.dtypes
Numeric           float64
String             object
Date       datetime64[ns]
dtype: object

%timeit xdf._get_numeric_data()
34.7 µs ± 870 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit xdf.select_dtypes(include=np.number)
797 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
%timeit xdf.select_dtypes(include=numerics)
991 µs ± 24.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Mainland
  • 4,110
  • 3
  • 25
  • 56
-1
def is_type(df, baseType):
    import numpy as np
    import pandas as pd
    test = [issubclass(np.dtype(d).type, baseType) for d in df.dtypes]
    return pd.DataFrame(data = test, index = df.columns, columns = ["test"])
def is_float(df):
    import numpy as np
    return is_type(df, np.float)
def is_number(df):
    import numpy as np
    return is_type(df, np.number)
def is_integer(df):
    import numpy as np
    return is_type(df, np.integer)
Hanan Shteingart
  • 8,480
  • 10
  • 53
  • 66
-1

numerical_col = df.describe().columns.to_list()

This is what i normally use. Since the describe method only returns numerical columns.