Drop non-numeric columns from a pandas DataFrame

Question

In my application I load text files that are structured as follows:

First non numeric column (ID)
A number of non-numeric columns (strings)
A number of numeric columns (floats)

The number of the non-numeric columns is variable. Currently I load the data into a DataFrame like this:

source = pandas.read_table(inputfile, index_col=0)

I would like to drop all non-numeric columns in one fell swoop, without knowing their names or indices, since this could be doable reading their dtype. Is this possible with pandas or do I have to cook up something on my own?

Related: http://stackoverflow.com/q/25039626/5069869 – Bernhard Oct 14 '16 at 09:52 — Bernhard, Oct 14 '16 at 09:52

sapo_cosmico · Accepted Answer · 2017-04-27T11:21:27.257

69

To avoid using a private method you can also use select_dtypes, where you can either include or exclude the dtypes you want.

Ran into it on this post on the exact same thing.

Or in your case, specifically:
source.select_dtypes(['number']) or source.select_dtypes([np.number]

edited Apr 27 '17 at 11:21

answered Sep 04 '15 at 13:55

sapo_cosmico

6,274
12
45
58

3

I think this is better than using the private method. Maybe you should add the direct answer to the question, which is: source.select_dtypes(['number']) or source.select_dtypes([numpy.number]) – hardsetting Feb 05 '17 at 00:22
1

This should be the accepted answer, although the other one will work too, this is more correct, not to mention that the private method, not being part of the api, might change at any time – Juan Antonio Gomez Moriano Apr 22 '17 at 07:42
Doesn't this return booleans? Also what is the different between 'number' and np.number (just a numpy array of numbers?) – Worthy7 Aug 03 '17 at 01:21

score 50 · Answer 2 · answered Oct 04 '12 at 11:41

50

It`s a private method, but it will do the trick: source._get_numeric_data()

In [2]: import pandas as pd

In [3]: source = pd.DataFrame({'A': ['foo', 'bar'], 'B': [1, 2], 'C': [(1,2), (3,4)]})

In [4]: source
Out[4]:
     A  B       C
0  foo  1  (1, 2)
1  bar  2  (3, 4)

In [5]: source._get_numeric_data()
Out[5]:
   B
0  1
1  2

answered Oct 04 '12 at 11:41

Wouter Overmeire

65,766
10
63
43

2

Thanks! Are there any precautions in using "private methods" in pandas? Or, alternatively, why is this private? (I can open a new question, if you suggest.) – Richard Herron Oct 04 '12 at 16:13
2

In general adding/removing/change-api of a private method is not considered a (class) api/behavior change. In other words a new version of pandas which is considered to be backwards compatible could e.g remove a private method. I believe _get_numeric_data() is mainly used to support plotting functions/methods. If you feel this is a useful method, you can do a feature request on github asking to make it part of the public api. – Wouter Overmeire Oct 04 '12 at 18:02

score 0 · Answer 3 · answered Mar 03 '19 at 11:43

0

This would remove each column which doesn't include float64 numerics.

df = pd.read_csv('sample.csv', index_col=0)
non_floats = []
for col in df:
    if df[col].dtypes != "float64":
        non_floats.append(col)
df = df.drop(columns=non_floats)

answered Mar 03 '19 at 11:43

Thomas Gotwig

3,659
3
17
15

3

You can also use ```pd.api.types.is_numeric_dtype(df[col])```. – Uzay Macar Jul 13 '19 at 21:19

score -1 · Answer 4 · edited May 14 '19 at 15:19

-1

I also have another possible solution for dropping the columns with categorical value with 2 lines of code, defining a list with columns of categorical values (1st line) and dropping them with the second line. df is our DataFrame

df before dropping:

  to_be_dropped=pd.DataFrame(df.categorical).columns
  df= df.drop(to_be_dropped,axis=1)

df after dropping:

edited May 14 '19 at 15:19

Community

1
1

answered Aug 03 '18 at 12:12

Luigi Bungaro

61
1
4

3

Doesn't work: `AttributeError: 'DataFrame' object has no attribute 'categorical'` – information_interchange Apr 02 '20 at 03:43

Drop non-numeric columns from a pandas DataFrame

4 Answers4

Linked