2

I searched for a solution on the site, but I couldn't find anything relevant, only outdated code. I am new to the Pandas library and I have the following dataframe as an example:

A B C D E
142 0.4 red 108 front
164 1.3 green 98 rear
71 -1.0 blue 234 front
109 0.2 black 120 front

I would like to extract the name of the columns that contain numbers (integers and floats). It is completely fine to use the first row to achieve this. So the result should look like this: ['A', 'B', 'D']

I tried the following command to get some of the columns that contained numbers:

dataframe.loc[0, dataframe.dtypes == 'int64']

Out:
A 142
D 108

There are two problems with this. First of all, I just need the name of the columns, but not the values. Second, this captures only the integer columns. My next attempt just gave an error:

dataframe.loc[0, dataframe.dtypes == 'int64' or dataframe.dtypes == 'float64']
Adrian
  • 468
  • 2
  • 6
  • 15
  • 1
    Does this answer your question? [How to determine whether a column/variable is numeric or not in Pandas/NumPy?](https://stackoverflow.com/questions/19900202/how-to-determine-whether-a-column-variable-is-numeric-or-not-in-pandas-numpy) – Marcelo Paco Apr 02 '23 at 03:25
  • 1
    It should be `dataframe.loc[0, (dataframe.dtypes == 'int64') | (dataframe.dtypes == 'float64')]`. I don't know why [pandas uses these characters though](https://stackoverflow.com/a/54358361/11235205) – Minh-Long Luu Apr 02 '23 at 03:29
  • But Marcelo answer is the better way though – Minh-Long Luu Apr 02 '23 at 03:29
  • @MarceloPaco It is one step closer to the solution, but I still don't get the name of the columns that contain numeric values. – Adrian Apr 02 '23 at 03:30
  • @Minh-LongLuu Your code did work! However, I still need to retrieve just the column names without any data. – Adrian Apr 02 '23 at 03:38
  • @Adrian you should move your comment to here. Btw, you can read only the first row via the parameter nrows: pd.read_csv('your_file.csv', nrows=1) – Minh-Long Luu Apr 02 '23 at 03:59

5 Answers5

3

Using select_dtypes:

dataframe.select_dtypes('number').columns.tolist()

Output:

['A', 'B', 'D']
mozway
  • 194,879
  • 13
  • 39
  • 75
2

You can use .dtype then .kind while filtering the the column names with list comprehension.

# import pandas as pd
# df = pd.read_html('https://stackoverflow.com/questions/75909965')[0] # scraped your q

[c for c in df.columns if df[c].dtype.kind in 'iufc']

should return ['A', 'B', 'D']. [Note that 'iufc' covers signed and unsigned integers as well as real and complex floating-point numbers. Add b if you want to cover Booleans as well since they're a subclass of int in python....]

Driftr95
  • 4,572
  • 2
  • 9
  • 21
1

Based on Marcelo's comment, you can use:

from pandas.api.types import is_numeric_dtype

numeric_columns = []
for column in df.columns:
    if is_numeric_dtype(df[column]):
        numeric_columns.append(column)
print(numeric_columns)
Minh-Long Luu
  • 2,393
  • 1
  • 17
  • 39
1

Another possibles solution:

import re

df.columns[
    [re.match(r'^(int|float)', x.name) != None for x in df.dtypes]].to_list()

Output:

['A', 'B', 'D']
PaulS
  • 21,159
  • 2
  • 9
  • 26
0

Use the below function:

First it select all the numeric columns, then it finds the columns, which is finally converted into list.

df.select_dtypes(include="number").columns.to_list()