0

usually when you want to remove columns which are not of type float, you can write pd.DataFrame.select_dtypes(include='float64'). however i would like to remove a column in cases where the header name is not a float

df = pd.DataFrame({'a' : [1,2,3], 10 : [3,4,5]})
df.dtypes

will give the output

a   int64
10   int64
dtype: object

how can i remove the column a based on the fact that it's not a float or int?

armara
  • 535
  • 3
  • 17
  • What about a column named `'28'` (as string)? Do you want to keep it, or remove it? – Cainã Max Couto-Silva Dec 01 '20 at 21:29
  • Hmm, good question. I want to remove it. Maybe by making sure that all header names that are applicable to be of type `float`, should be just that. And the rest should be removed. So for example: `28` should be kept, but `Unnamed: 28` should be removed. – armara Dec 01 '20 at 21:31

4 Answers4

2

Please Try drop column with digit using regex if you wanted to drop 10

df.filter(regex='\d', axis=1)

#On the contrary, you can drop nondigits too

   # df.filter(regex='\D', axis=1)
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • 1
    wow, this worked seemlessly for a column named `a`! However if I changed the name to a more complicated string like `Unnamed: 28`, it doesn't work. can i change the regex input to make it work for all `string`-types? – armara Dec 01 '20 at 21:18
  • Yap, try exlude any alphabets by `df.filter(regex='^[^a-z]+$', axis=1)` or exclude digits `df.filter(regex='[^\d+]', axis=1)`. This should achieve contrary outcomes as well – wwnde Dec 01 '20 at 21:38
  • 1
    big thanks, i'm actually using the solution `df.filter(regex='^[^a-z]+$', axis=1)` now so i'll give you the green tick. i'll have to read up a bit on how to use regex aswell, this seems powerful – armara Dec 01 '20 at 21:50
  • 1
    I upvoted this answer! It's very good! Just to note that `^[^a-z]+$` will keep columns with characters like `$^~\`/`, uppercase letters, and etc (only if there's no [a-z]). – Cainã Max Couto-Silva Dec 01 '20 at 22:13
1

A solution based on type enumeration:

Code

sr_dtype = df.dtypes
df = df.drop(columns=sr_dtype.index[
    sr_dtype.index.map(lambda el: not isinstance(el, (int, float)))  # add more if necessary
])

Note that df.types itself is a Series instance that regular Series operations are applicable. In particular, index.map() is used as a wrapper for isinstance() check in this example.

Result

print(df)
   10
0   3
1   4
2   5
Bill Huang
  • 4,491
  • 2
  • 13
  • 31
  • i changed `isinstance(el, (int, float))` to `isinstance(el, (str))` and it worked, thank you! – armara Dec 01 '20 at 21:27
  • To delete non-int, float, etc., use this function: `lambda el: not isinstance(el, ...)`. The answer is revised. – Bill Huang Dec 01 '20 at 21:50
0

Are you sure that's the right output? Your dataframe columns are 'a' and 10, why your input has a column named 'b'?

Anyway, to remove the column a, regardless to its type but through its header name, use the drop method:
df = df.drop(columns=['a'])
Also works with a list of columns as well, instead of the single element list in this case.

itaishz
  • 701
  • 1
  • 4
  • 10
0

Based in the other answers, you can also try:

1) To make sure to keep only float and int types:

df[[col for col in df.columns if type(col) in [float,int]]]

2) To just exclude string-like columns:

df.loc[:, [not isinstance(col, str) for col in df.columns]] # return bool array
# or
df[[col for col in df.columns if not isinstance(col, str)]] # return colum names

3) To exclude columns that's not float/int based on regex:

df.filter(regex='^\d+$|^\d+?\.{1}\d+$') 

where the first expression ^\d+$ map integers (start and end with digit), and the second expression ^\d+?\.{1}\d+$ maps floats. We could just use ^[\d|\.]+$ (allowing only digits and points) to map both of them, but it would also maps columns like "1..2".

Cainã Max Couto-Silva
  • 4,839
  • 1
  • 11
  • 35