How to remove all columns whose column labels contain unique characters from Pandas dataframe

Question

I have a dataframe contains 3700 rows x 53 columns. I need to remove all columns which contains unique characters "nan" from this dataframe.

For example:

df = 

 X1  X2_nan      x3  x4_nan
 34       0    65.8       1
 35       1  5432.1       1
 87       1   443.1       0
 65       0   213.3       0

I want to remove all columns contains "nan" to get:

jezrael · Accepted Answer · 2021-05-11T04:19:20.777

2

Use str.contains for test substring nans and filter by DataFrame.loc with boolean indexing with ~ for invert mask for NOT contains:

df1 = df.loc[:, ~df.columns.astype(str).str.contains('nan')]
print (df1)
   X1      x3
0  34    65.8
1  35  5432.1
2  87   443.1
3  65   213.3

edited May 11 '21 at 04:19

answered May 11 '21 at 04:15

jezrael

Thank you for your prompt reply. I got this error "Can only use .str accessor with string values!" – Mohamed May 11 '21 at 04:18
@Mohamed - Add `.astype(str)`, answer was edited. – jezrael May 11 '21 at 04:21

SeaBean · Answer 2 · 2021-05-11T18:40:50.703

You can use .filter() with regex, as follows:

The regex ^(?!.*nan).*$ is to match only strings that does not contain character sequence 'nan' anywhere in the column names.

df1 = df.filter(regex=r'^(?!.*nan).*$')



print (df1)
   X1      x3
0  34    65.8
1  35  5432.1
2  87   443.1
3  65   213.3

If you are not familiar with regex and want to make sure the regex is correct, you can refer to this post for further information.

2 Answers2