find numeric column names in Pandas

Question

I need to select columns in Pandas which contain only numeric values in column names, for example:

df=
          0     1     2     3     4 window_label next_states       ids
0      17.0  18.0  16.0  15.0  15.0        ddddd           d      13.0
1      18.0  16.0  15.0  15.0  16.0        ddddd           d      13.0
2      16.0  15.0  15.0  16.0  15.0        ddddd           d      13.0
3      15.0  15.0  16.0  15.0  17.0        ddddd           d      13.0
4      15.0  16.0  15.0  17.0   NaN        ddddd           d      13.0

so I need to select only first five columns. Something like:

df[df.columns.isnumeric()]

EDIT

I came up with the solution:

digit_column_names = [num for num in list(df.columns) if isinstance(num, (int,float))]
df_new = df[digit_column_names]

not very pythonic or pandasian, but it works.

@gobrewers14, tried, it also gives the 'ids' column, which is unwanted. — Arnold Klein, May 10 '17 at 17:03
Your question is unclear then. "I need to select columns in Pandas which contain only numeric values." `ids` is numeric. — o-90, May 10 '17 at 17:05
@gobrewers14, I agree, amended the question. But for the sake of defense, the title of the question does contain what I was after. Thanks. — Arnold Klein, May 10 '17 at 17:06
@ArnoldKlein, regarding your EDIT - can you have numbers as column names? I mean real numbers not their string representation? — MaxU - stand with Ukraine, May 10 '17 at 17:20
@MaxU, I am not sure, but I think they are real numbers. `df.columns Out[341]: Index([0, 1, 2, 3, 4, u'window_label', u'next_states', u'ids'], dtype='object')' — Arnold Klein, May 10 '17 at 17:24
@ArnoldKlein, i've added an answer - please check. Please do not un-accept already accepted answer... — MaxU - stand with Ukraine, May 10 '17 at 17:27

Vaishali · Accepted Answer · 2017-05-10T20:42:37.933

10

Try

df.ids = df.ids.astype('object')    
new_df = df.select_dtypes([np.number])


    0       1       2       3       4       
0   17.0    18.0    16.0    15.0    15.0    
1   18.0    16.0    15.0    15.0    16.0    
2   16.0    15.0    15.0    16.0    15.0    
3   15.0    15.0    16.0    15.0    17.0    
4   15.0    16.0    15.0    17.0    NaN

EDIT: If you are interested in selecting column names that are numeric, here is something that you can do.

df = pd.DataFrame({0: [1,2], '1': [3,4], 'blah': [5,6], 2: [7,8]})
df.columns = pd.to_numeric(df.columns, errors = 'coerce')
df[df.columns.dropna()]

You get

    0.0 1.0 2.0
0   1   3   7
1   2   4   8

edited May 10 '17 at 20:42

answered May 10 '17 at 16:57

Vaishali

37,545
5
58
86

thanks, but it also selects the very last column with the name 'ids', where it should not. – Arnold Klein May 10 '17 at 16:58
1

@ArnoldKlein, then you should rephrase (or better open a new) question. This answer perfectly answers your question - this is the most idiomatic way to select __all__ numeric columns – MaxU - stand with Ukraine May 10 '17 at 17:01
1

The only way to not include id would be to change the dtype of id to object. Pl see the edit – Vaishali May 10 '17 at 17:02
@A-Za-z, elegant! Many thanks. Sorry for misleading. – Arnold Klein May 10 '17 at 17:07
I think OP is referring to column names not column values? He is interest in column names which are numerical.(0,1,2,3,4)? – Moondra May 10 '17 at 19:03

score 2 · Answer 2 · answered Sep 02 '20 at 19:36

2

How about this solution?
This checks if every character of the column is a digit.

cols = [col for col in df.columns if all(char.isdigit() for char in col)]
df[cols]

answered Sep 02 '20 at 19:36

igorkf

3,159
2
22
31

MaxU - stand with Ukraine · Answer 3 · 2017-05-10T17:29:33.990

Here is an answer for the EDIT part:

i've intentionally created a mixture of column names as real numbers and strings that can be converted to numbers:

In [44]: df.columns.tolist()
Out[44]: [0, 1, 2, 3, '4', 'window_label', 'next_states', 'ids']
# NOTE:                ^

we can use pd.to_numeric(..., errors='coerce') method:

In [41]: df.columns[pd.to_numeric(df.columns, errors='coerce').to_series().notnull()]
Out[41]: Index([0, 1, 2, 3, '4'], dtype='object')

In [42]: cols = df.columns[pd.to_numeric(df.columns, errors='coerce').to_series().notnull()]

In [43]: df[cols]
Out[43]:
      0     1     2     3     4
0  17.0  18.0  16.0  15.0  15.0
1  18.0  16.0  15.0  15.0  16.0
2  16.0  15.0  15.0  16.0  15.0
3  15.0  15.0  16.0  15.0  17.0
4  15.0  16.0  15.0  17.0   NaN

score 1 · Answer 4 · edited May 23 '17 at 12:18

I found another question on this website that is pretty related. I used the code from that and applied it to your problem. I also threw a float into the column names to make sure it worked with int and float. It looks like:

import pandas as pd

df = pd.DataFrame({0: [17.0, 18, 16, 15, 15],
                   1: [18.0, 16, 15, 15, 16],
                   2.0: [16.0, 15, 15, 16, 15],
                   3: [15.0, 15, 16, 15, 17],
                   4: [15.0, 16, 15, 17, None],
                   'window_label': ['ddddd' for i in range(5)],
                   'next_states': ['d' for i in range(5)],
                   'ids': [13.0 for i in range(5)]})

num_cols = []
for col in df.columns.values:
    try:
        float(col)
        num_cols.append(col)
    except ValueError:
        pass

print(df[num_cols])

and the result looks like:

      0     1   2.0     3     4
0  17.0  18.0  16.0  15.0  15.0
1  18.0  16.0  15.0  15.0  16.0
2  16.0  15.0  15.0  16.0  15.0
3  15.0  15.0  16.0  15.0  17.0
4  15.0  16.0  15.0  17.0   NaN

Edit1: I just realized that you can keep the numeric determiner in a generator function and have a slightly faster/certainly less memory intensive way of doing the same thing.

import pandas as pd


def is_num(cols):
    for col in cols:
        try:
            float(col)
            yield col
        except ValueError:
            continue

df = pd.DataFrame({0: [17.0, 18, 16, 15, 15],
                   1: [18.0, 16, 15, 15, 16],
                   2.0: [16.0, 15, 15, 16, 15],
                   3: [15.0, 15, 16, 15, 17],
                   4: [15.0, 16, 15, 17, None],
                   'window_label': ['ddddd' for i in range(5)],
                   'next_states': ['d' for i in range(5)],
                   'ids': [13.0 for i in range(5)]})

print(df[[col for col in is_num(df.columns.values)]])

yields the exact same result as above, although it is somewhat less readable.

score 0 · Answer 5 · answered May 10 '17 at 19:21

0

If you are only looking for numeric column names I think this should work:

df.columns[df.columns.str.isnumeric()]

or this

df.iloc[:,df.columns.str.isnumeric()]

answered May 10 '17 at 19:21

Moondra

4,399
9
46
104

I am trying to get only numeric columns names . i applied your first code. and getting this error : ValueError: Cannot mask with non-boolean array containing NA / NaN values – Govinda Raju Nov 07 '20 at 05:25
The proper solution here would be `df.iloc[:, df.columns.map(str).str.isnumeric()]` – Peter Nov 17 '20 at 09:12

find numeric column names in Pandas

5 Answers5

Linked