56

In my application I load text files that are structured as follows:

  • First non numeric column (ID)
  • A number of non-numeric columns (strings)
  • A number of numeric columns (floats)

The number of the non-numeric columns is variable. Currently I load the data into a DataFrame like this:

source = pandas.read_table(inputfile, index_col=0)

I would like to drop all non-numeric columns in one fell swoop, without knowing their names or indices, since this could be doable reading their dtype. Is this possible with pandas or do I have to cook up something on my own?

Einar
  • 4,727
  • 7
  • 49
  • 64

4 Answers4

69

To avoid using a private method you can also use select_dtypes, where you can either include or exclude the dtypes you want.

Ran into it on this post on the exact same thing.

Or in your case, specifically:
source.select_dtypes(['number']) or source.select_dtypes([np.number]

sapo_cosmico
  • 6,274
  • 12
  • 45
  • 58
  • 3
    I think this is better than using the private method. Maybe you should add the direct answer to the question, which is: source.select_dtypes(['number']) or source.select_dtypes([numpy.number]) – hardsetting Feb 05 '17 at 00:22
  • 1
    This should be the accepted answer, although the other one will work too, this is more correct, not to mention that the private method, not being part of the api, might change at any time – Juan Antonio Gomez Moriano Apr 22 '17 at 07:42
  • Doesn't this return booleans? Also what is the different between 'number' and np.number (just a numpy array of numbers?) – Worthy7 Aug 03 '17 at 01:21
50

It`s a private method, but it will do the trick: source._get_numeric_data()

In [2]: import pandas as pd

In [3]: source = pd.DataFrame({'A': ['foo', 'bar'], 'B': [1, 2], 'C': [(1,2), (3,4)]})

In [4]: source
Out[4]:
     A  B       C
0  foo  1  (1, 2)
1  bar  2  (3, 4)

In [5]: source._get_numeric_data()
Out[5]:
   B
0  1
1  2
Wouter Overmeire
  • 65,766
  • 10
  • 63
  • 43
  • 2
    Thanks! Are there any precautions in using "private methods" in pandas? Or, alternatively, why is this private? (I can open a new question, if you suggest.) – Richard Herron Oct 04 '12 at 16:13
  • 2
    In general adding/removing/change-api of a private method is not considered a (class) api/behavior change. In other words a new version of pandas which is considered to be backwards compatible could e.g remove a private method. I believe _get_numeric_data() is mainly used to support plotting functions/methods. If you feel this is a useful method, you can do a feature request on github asking to make it part of the public api. – Wouter Overmeire Oct 04 '12 at 18:02
0

This would remove each column which doesn't include float64 numerics.

df = pd.read_csv('sample.csv', index_col=0)
non_floats = []
for col in df:
    if df[col].dtypes != "float64":
        non_floats.append(col)
df = df.drop(columns=non_floats)
Thomas Gotwig
  • 3,659
  • 3
  • 17
  • 15
-1

I also have another possible solution for dropping the columns with categorical value with 2 lines of code, defining a list with columns of categorical values (1st line) and dropping them with the second line. df is our DataFrame

df before dropping: df before dropping

  to_be_dropped=pd.DataFrame(df.categorical).columns
  df= df.drop(to_be_dropped,axis=1)

df after dropping: df after dropping

Community
  • 1
  • 1
Luigi Bungaro
  • 61
  • 1
  • 4