How do I find numeric columns in Pandas?

Question

Let's say df is a pandas DataFrame. I would like to find all columns of numeric type. Something like:

isNumeric = is_numeric(df)

You should specify whether a column that has `dtype` being `object`, but all elements being numeric, counts as numeric or not. If no, take Hanan's answer, as it is also faster. Otherwise, take mine. — FooBar, Jul 30 '14 at 15:07
What happens if you simply try df.describe().columns. Then assign it to a variable. — coldy, Feb 06 '19 at 09:51
Related: [Get list of pandas dataframe columns based on data type](https://stackoverflow.com/questions/22470690/get-list-of-pandas-dataframe-columns-based-on-data-type). Then you just need to list the integer and float types to `df.select_dtypes(include=[...])`. — smci, Dec 22 '21 at 08:50

score 274 · Answer 1 · answered Jan 26 '15 at 17:39

274

You could use select_dtypes method of DataFrame. It includes two parameters include and exclude. So isNumeric would look like:

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']

newdf = df.select_dtypes(include=numerics)

answered Jan 26 '15 at 17:39

Anand

2,765
1
13
3

175

You could use df.select_dtypes(include=[np.number]) if you don't need to specify a 'numerics' list – KieranPC Mar 19 '15 at 16:38
48

Building on the tip in the previous comment (+1), you could just use `list(df.select_dtypes(include=[np.number]).columns.values)` to get a list of names of the numeric columns – user799188 Mar 24 '16 at 23:48
1

Conversely, a classical filter with string matching is even more performant (measured): `list(filter(lambda x: 'float' not in str(df.dtypes[x]) and 'int' not in str(df.dtypes[x]), df.columns))` – philvec Oct 14 '21 at 07:54
4

One-liner if you want column names `list(df.select_dtypes('number'))` (from pandas v1.0.0) – Cristobal Oct 19 '21 at 10:04
8

This answer looks obsolete. In 2022, "To select all numeric types, use `np.number` or `'number'`", from https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.select_dtypes.html – PatrickT Jan 09 '22 at 04:25
1

This did not work for me. I was losing the int64 columns. The solution that worked was df.select_dtypes(include=[np.number]) – Sergio Polimante Apr 29 '22 at 22:20

score 172 · Answer 2 · edited Oct 22 '19 at 14:55

172

Simple one-line answer to create a new dataframe with only numeric columns:

df.select_dtypes(include=np.number)

If you want the names of numeric columns:

df.select_dtypes(include=np.number).columns.tolist()

Complete code:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': range(7, 10),
                   'B': np.random.rand(3),
                   'C': ['foo','bar','baz'],
                   'D': ['who','what','when']})
df
#    A         B    C     D
# 0  7  0.704021  foo   who
# 1  8  0.264025  bar  what
# 2  9  0.230671  baz  when

df_numerics_only = df.select_dtypes(include=np.number)
df_numerics_only
#    A         B
# 0  7  0.704021
# 1  8  0.264025
# 2  9  0.230671

colnames_numerics_only = df.select_dtypes(include=np.number).columns.tolist()
colnames_numerics_only
# ['A', 'B']

edited Oct 22 '19 at 14:55

yatu

86,083
12
84
139

answered Oct 10 '17 at 01:27

stackoverflowuser2010

38,621
48
169
217

2

`df.select_dtypes(include=['int64']).columns.tolist()` – Cherry Wu Jan 14 '18 at 06:51
If you only want one type, you don't need to store it in a list. Nor do you need to specify `include=`. `select_dtypes(np.number)` – BallpointBen Jun 04 '18 at 18:31
If your columns have numeric data but also have None, the dtype could be 'object'. This will coerce the columns to numeric: `df.fillna(value=0, inplace=True)` – vaughnkoch Jun 07 '18 at 03:15
1

also: `df.select_dtypes('number')`. It's even shorter and you don't have to import numpy – Boris Gorelik Mar 22 '21 at 09:59

score 93 · Answer 3 · edited May 20 '19 at 16:53

93

You can use the undocumented function _get_numeric_data() to filter only numeric columns:

df._get_numeric_data()

Example:

In [32]: data
Out[32]:
   A  B
0  1  s
1  2  s
2  3  s
3  4  s

In [33]: data._get_numeric_data()
Out[33]:
   A
0  1
1  2
2  3
3  4

Note that this is a "private method" (i.e., an implementation detail) and is subject to change or total removal in the future. Use with caution.

edited May 20 '19 at 16:53

cs95

379,657
97
704
746

answered Dec 30 '15 at 13:00

Kathirmani Sukumar

10,445
5
33
34

2

Super handy; is this documented anywhere? Concerned about it disappearing in future versions and/or instability, as [its prefix underscore indicates that it's meant to be private.](https://stackoverflow.com/a/1301369/588437) – ijoseph Apr 10 '18 at 18:23
6

No, this isn't documented anywhere. The implementation is [here](https://github.com/pandas-dev/pandas/blob/870b6a6d6415c76d051b287adcb180ac3020b6e8/pandas/core/generic.py#L3538-L3540), however, like @ijoseph mentioned I would be wary of using methods that begin with underscores as they are little more than implementation details. Use literally ANY other answer besides this. – cs95 May 20 '19 at 00:24
1

Exactly. As a best practice I try to use and convert to as many numpy methods as possible. This is due to pandas dynamism. The API changes frequently. For undocumented methods it's just plain reckless, no matter how useful it is. – mik Aug 21 '19 at 12:42

score 74 · Answer 4 · edited Nov 21 '20 at 14:44

74

df.select_dtypes(exclude = ['object'])

Update:

df.select_dtypes(include= np.number)

or with new version of panda

 df.select_dtypes('number')

edited Nov 21 '20 at 14:44

Antoine Dubuis

4,974
1
15
29

answered May 15 '17 at 14:59

BENY

317,841
20
164
234

8

datetime columns are a different type `datetime` thay are not numeric types – Jeru Luke Oct 14 '17 at 12:48
I really liked the above: `df.select_dtypes(include= np.number)` for numeric columns and for non numeric cols: `df.select_dtypes(exclude= np.number)` ... it is helpful to me.. – LeMarque Nov 03 '22 at 04:21
In my case I also needed to exclude timedeltas: select_dtypes(include='number', exclude='timedelta'). Perhaps this is obvious to others! Leaving here just in case it is not! – Jake Drew May 29 '23 at 15:20

score 45 · Answer 5 · answered Oct 29 '19 at 11:07

45

Simple one-liner:

df.select_dtypes('number').columns

answered Oct 29 '19 at 11:07

nimbous

1,507
9
12

7

By far the most Pythonic way, yes. – gosuto Apr 19 '20 at 08:40

score 10 · Answer 6 · edited Jun 15 '21 at 14:05

Following codes will return list of names of the numeric columns of a data set.

cnames=list(marketing_train.select_dtypes(exclude=['object']).columns)

here marketing_train is my data set and select_dtypes() is function to select data types using exclude and include arguments and columns is used to fetch the column name of data set output of above code will be following:

['custAge',
     'campaign',
     'pdays',
     'previous',
     'emp.var.rate',
     'cons.price.idx',
     'cons.conf.idx',
     'euribor3m',
     'nr.employed',
     'pmonths',
     'pastEmail']

score 5 · Answer 7 · edited Nov 05 '19 at 13:44

5

This is another simple code for finding numeric column in pandas data frame,

numeric_clmns = df.dtypes[df.dtypes != "object"].index

edited Nov 05 '19 at 13:44

Mykola Zotko

15,583
3
71
73

answered Sep 27 '17 at 03:42

Anvesh_vs

376
4
7

score 4 · Answer 8 · answered May 18 '20 at 04:07

We can include and exclude data types as per the requirement as below:

train.select_dtypes(include=None, exclude=None)
train.select_dtypes(include='number') #will include all the numeric types

Referred from Jupyter Notebook.

To select all numeric types, use np.number or 'number'

To select strings you must use the object dtype but note that this will return all object dtype columns
See the NumPy dtype hierarchy <http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html>__
To select datetimes, use np.datetime64, 'datetime' or 'datetime64'
To select timedeltas, use np.timedelta64, 'timedelta' or 'timedelta64'
To select Pandas categorical dtypes, use 'category'
To select Pandas datetimetz dtypes, use 'datetimetz' (new in 0.20.0) or ``'datetime64[ns, tz]'

score 4 · Answer 9 · edited Mar 22 '22 at 13:31

4

Although this is old subject,

but i think the following formula is easier than all other comments

df[df.describe().columns]

As the function describe() only works for numeric columns, the column of the output will only be numeric.

edited Mar 22 '22 at 13:31

Adam

840
6
24

answered Feb 19 '22 at 20:53

ahmed sabri

41
3

2

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Feb 19 '22 at 21:51
I can confirm this works, so thanks for that, but I also would love an explanation of WHY it works. It is not obvious to me. – Adam Mar 20 '22 at 07:11
1

@Adam As the function `describe()` only works for numeric columns, so the column of the output will only be numeric – ahmed sabri Mar 21 '22 at 08:44
Doesn't work in 2022. – Amarpreet Singh May 19 '22 at 06:03

score 1 · Answer 10 · answered Feb 18 '18 at 22:37

Please see the below code:

if(dataset.select_dtypes(include=[np.number]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.number]).describe())
if(dataset.select_dtypes(include=[np.object]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.object]).describe())

This way you can check whether the value are numeric such as float and int or the srting values. the second if statement is used for checking the string values which is referred by the object.

score 0 · Answer 11 · edited May 23 '17 at 11:47

0

Adapting this answer, you could do

df.ix[:,df.applymap(np.isreal).all(axis=0)]

Here, np.applymap(np.isreal) shows whether every cell in the data frame is numeric, and .axis(all=0) checks if all values in a column are True and returns a series of Booleans that can be used to index the desired columns.

edited May 23 '17 at 11:47

Community

1
1

answered Oct 10 '14 at 09:00

Garrett

4,007
2
41
59

Alexander · Answer 12 · 2023-02-09T04:41:05.107

A lot of the posted answers are inefficient. These answers either return/select a subset of the original dataframe (a needless copy) or perform needless computational statistics in the case of describe().

To just get the column names that are numeric, one can use a conditional list comprehension with the pd.api.types.is_numeric_dtype function:

numeric_cols = [col for col in df if pd.api.types.is_numeric_dtype(df[col])]

I'm not sure when this function was introduced.

score 0 · Answer 13 · answered Mar 20 '23 at 01:43

@Kathiramani Sukumar's answer df._get_numeric_data() takes the cake.

xdf = pd.DataFrame({'Numeric':[20,10,np.nan],'String':['foo','bar','daa'],'Date':[datetime(2023,1,1,0,0,0),datetime(2023,1,2,0,0,0),np.nan]})
xdf.dtypes
Numeric           float64
String             object
Date       datetime64[ns]
dtype: object

%timeit xdf._get_numeric_data()
34.7 µs ± 870 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit xdf.select_dtypes(include=np.number)
797 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
%timeit xdf.select_dtypes(include=numerics)
991 µs ± 24.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

score -1 · Answer 14 · answered Jul 30 '14 at 14:36

def is_type(df, baseType):
    import numpy as np
    import pandas as pd
    test = [issubclass(np.dtype(d).type, baseType) for d in df.dtypes]
    return pd.DataFrame(data = test, index = df.columns, columns = ["test"])
def is_float(df):
    import numpy as np
    return is_type(df, np.float)
def is_number(df):
    import numpy as np
    return is_type(df, np.number)
def is_integer(df):
    import numpy as np
    return is_type(df, np.integer)

score -1 · Answer 15 · answered May 20 '23 at 09:40

-1

numerical_col = df.describe().columns.to_list()

This is what i normally use. Since the describe method only returns numerical columns.

answered May 20 '23 at 09:40

Paritosh Sharma Ghimire

1
1

How do I find numeric columns in Pandas?

15 Answers15

Linked

Related