40

I'm using df.columns.values to make a list of column names which I then iterate over and make charts, etc... but when I set this up I overlooked the non-numeric columns in the df. Now, I'd much rather not simply drop those columns from the df (or a copy of it). Instead, I would like to find a slick way to eliminate them from the list of column names.

Now I have:

names = df.columns.values 

what I'd like to get to is something that behaves like:

names = df.columns.values(column_type=float64) 

Is there any slick way to do this? I suppose I could make a copy of the df, and drop those non-numeric columns before doing columns.values, but that strikes me as clunky.

Welcome any inputs/suggestions. Thanks.

mariogarcc
  • 386
  • 4
  • 13
Charlie_M
  • 1,281
  • 2
  • 16
  • 20

5 Answers5

25

Someone will give you a better answe than this possibly, but one thing I tend to do is if all my numeric data are int64 or float64 objects, then you can create a dict of the column data types and then use the values to create your list of columns.

So for example, in a dataframe where I have columns of type float64, int64 and object firstly you can look at the data types as so:

DF.dtypes

and if they conform to the standard whereby the non-numeric columns of data are all object types (as they are in my dataframes), then you can do the following to get a list of the numeric columns:

[key for key in dict(DF.dtypes) if dict(DF.dtypes)[key] in ['float64', 'int64']]

Its just a simple list comprehension. Nothing fancy. Again, though whether this works for you will depend upon how you set up you dataframe...

Woody Pride
  • 13,539
  • 9
  • 48
  • 62
  • I ended up using this because it works and because I'm running 0.14.0 and didn't want to upgrade to 0.14.1 in the middle of my project. Thanks. – Charlie_M Jul 23 '14 at 16:52
24

dtypes is a Pandas Series. That means it contains index & values attributes. If you only need the column names:

headers = df.dtypes.index

it will return a list containing the column names of "df" dataframe.

Arthur Zennig
  • 2,058
  • 26
  • 20
20

There's a new feature in 0.14.1, select_dtypes to select columns by dtype, by providing a list of dtypes to include or exclude.

For example:

df = pd.DataFrame({'a': np.random.randn(1000),
                   'b': range(1000),
                   'c': ['a'] * 1000,
                   'd': pd.date_range('2000-1-1', periods=1000)})


df.select_dtypes(['float64','int64'])

Out[129]: 
            a    b
0    0.153070    0
1    0.887256    1
2   -1.456037    2
3   -1.147014    3
...
chrisb
  • 49,833
  • 8
  • 70
  • 70
  • `select_dtypes` now also allows selecting more general categories (`df.select_dtypes('number')`, `df.select_dtypes('object')` or `df.select_dtypes('datetime')`, for example). – ayhan Jan 02 '19 at 19:32
8

To get the column names from pandas dataframe in python3- Here I am creating a data frame from a fileName.csv file

>>> import pandas as pd
>>> df = pd.read_csv('fileName.csv')
>>> columnNames = list(df.head(0)) 
>>> print(columnNames)
J11
  • 455
  • 4
  • 8
0

You can also try to get the column names from panda data frame that returns columnn name as well dtype. here i'll read csv file from https://mlearn.ics.uci.edu/databases/autos/imports-85.data but you have define header that contain columns names.

import pandas as pd

url="https://mlearn.ics.uci.edu/databases/autos/imports-85.data"

df=pd.read_csv(url,header = None)

headers=["symboling","normalized-losses","make","fuel-type","aspiration","num-of-doors","body-style",
         "drive-wheels","engine-location","wheel-base","length","width","height","curb-weight","engine-type",
         "num-of-cylinders","engine-size","fuel-system","bore","stroke","compression-ratio","horsepower","peak-rpm"
         ,"city-mpg","highway-mpg","price"]

df.columns=headers

print df.columns
Meloman
  • 3,558
  • 3
  • 41
  • 51