1

I have a dataframe contains 3700 rows x 53 columns. I need to remove all columns which contains unique characters "nan" from this dataframe.

For example:

df = 

 X1  X2_nan      x3  x4_nan
 34       0    65.8       1
 35       1  5432.1       1
 87       1   443.1       0
 65       0   213.3       0

I want to remove all columns contains "nan" to get:

df1 = 

 X1      x3
 34    65.8
 35  5432.1
 87   443.1
 65   213.3
SeaBean
  • 22,547
  • 3
  • 13
  • 25
Mohamed
  • 51
  • 5

2 Answers2

2

Use str.contains for test substring nans and filter by DataFrame.loc with boolean indexing with ~ for invert mask for NOT contains:

df1 = df.loc[:, ~df.columns.astype(str).str.contains('nan')]
print (df1)
   X1      x3
0  34    65.8
1  35  5432.1
2  87   443.1
3  65   213.3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

You can use .filter() with regex, as follows:

The regex ^(?!.*nan).*$ is to match only strings that does not contain character sequence 'nan' anywhere in the column names.

df1 = df.filter(regex=r'^(?!.*nan).*$')



print (df1)
   X1      x3
0  34    65.8
1  35  5432.1
2  87   443.1
3  65   213.3

If you are not familiar with regex and want to make sure the regex is correct, you can refer to this post for further information.

SeaBean
  • 22,547
  • 3
  • 13
  • 25