Remove column based on header name type

Question

usually when you want to remove columns which are not of type float, you can write pd.DataFrame.select_dtypes(include='float64'). however i would like to remove a column in cases where the header name is not a float

df = pd.DataFrame({'a' : [1,2,3], 10 : [3,4,5]})
df.dtypes

will give the output

a   int64
10   int64
dtype: object

how can i remove the column a based on the fact that it's not a float or int?

What about a column named `'28'` (as string)? Do you want to keep it, or remove it? — Cainã Max Couto-Silva, Dec 01 '20 at 21:29
Hmm, good question. I want to remove it. Maybe by making sure that all header names that are applicable to be of type `float`, should be just that. And the rest should be removed. So for example: `28` should be kept, but `Unnamed: 28` should be removed. — armara, Dec 01 '20 at 21:31

score 2 · Accepted Answer · answered Dec 01 '20 at 21:03

2

Please Try drop column with digit using regex if you wanted to drop 10

df.filter(regex='\d', axis=1)

#On the contrary, you can drop nondigits too

   # df.filter(regex='\D', axis=1)

answered Dec 01 '20 at 21:03

wwnde

26,119
6
18
32

1

wow, this worked seemlessly for a column named `a`! However if I changed the name to a more complicated string like `Unnamed: 28`, it doesn't work. can i change the regex input to make it work for all `string`-types? – armara Dec 01 '20 at 21:18
Yap, try exlude any alphabets by `df.filter(regex='^[^a-z]+$', axis=1)` or exclude digits `df.filter(regex='[^\d+]', axis=1)`. This should achieve contrary outcomes as well – wwnde Dec 01 '20 at 21:38
1

big thanks, i'm actually using the solution `df.filter(regex='^[^a-z]+$', axis=1)` now so i'll give you the green tick. i'll have to read up a bit on how to use regex aswell, this seems powerful – armara Dec 01 '20 at 21:50
1

I upvoted this answer! It's very good! Just to note that `^[^a-z]+$` will keep columns with characters like `$^~\`/`, uppercase letters, and etc (only if there's no [a-z]). – Cainã Max Couto-Silva Dec 01 '20 at 22:13

Bill Huang · Answer 2 · 2020-12-01T21:52:57.660

1

A solution based on type enumeration:

Code

sr_dtype = df.dtypes
df = df.drop(columns=sr_dtype.index[
    sr_dtype.index.map(lambda el: not isinstance(el, (int, float)))  # add more if necessary
])

Note that df.types itself is a Series instance that regular Series operations are applicable. In particular, index.map() is used as a wrapper for isinstance() check in this example.

Result

print(df)
   10
0   3
1   4
2   5

edited Dec 01 '20 at 21:52

answered Dec 01 '20 at 21:09

Bill Huang

4,491
2
13
31

i changed `isinstance(el, (int, float))` to `isinstance(el, (str))` and it worked, thank you! – armara Dec 01 '20 at 21:27
To delete non-int, float, etc., use this function: `lambda el: not isinstance(el, ...)`. The answer is revised. – Bill Huang Dec 01 '20 at 21:50

score 0 · Answer 3 · answered Dec 01 '20 at 21:02

0

Are you sure that's the right output? Your dataframe columns are 'a' and 10, why your input has a column named 'b'?

Anyway, to remove the column a, regardless to its type but through its header name, use the drop method:
df = df.drop(columns=['a'])
Also works with a list of columns as well, instead of the single element list in this case.

answered Dec 01 '20 at 21:02

itaishz

701
1
4
10

sorry had written wrong, it was supposed to say 10, not b! – armara Dec 01 '20 at 21:08

score 0 · Answer 4 · answered Dec 01 '20 at 22:05

Based in the other answers, you can also try:

1) To make sure to keep only float and int types:

df[[col for col in df.columns if type(col) in [float,int]]]

2) To just exclude string-like columns:

df.loc[:, [not isinstance(col, str) for col in df.columns]] # return bool array
# or
df[[col for col in df.columns if not isinstance(col, str)]] # return colum names

3) To exclude columns that's not float/int based on regex:

df.filter(regex='^\d+$|^\d+?\.{1}\d+$')

where the first expression ^\d+$ map integers (start and end with digit), and the second expression ^\d+?\.{1}\d+$ maps floats. We could just use ^[\d|\.]+$ (allowing only digits and points) to map both of them, but it would also maps columns like "1..2".

Remove column based on header name type

4 Answers4

Code

Result

Linked