2

Can I use Python to get a list of the column names in which all values are NaNs, return c and d as result from dataframe below? Thanks.

df = pd.DataFrame({'a': [1,2,3],'b': [3,4,5], 'c':[np.nan, np.nan, np.nan],
                   'd':[np.nan, np.nan, np.nan]})

   a  b   c   d
0  1  3 NaN NaN
1  2  4 NaN NaN
2  3  5 NaN NaN
ah bon
  • 9,293
  • 12
  • 65
  • 148
  • [Here is an SO question about rows instead columns](https://stackoverflow.com/q/38884538/8881141). You should be able to adapt it. – Mr. T May 27 '18 at 08:44
  • 1
    **@ahbon**, use `df.any()` as I have shown in the answer. You also check http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.any.html and try to find any other solution. I did it in hurry and I don't think it is best as I am getting columns and recreating `df`. Any way it works as you want. Thanks. – hygull May 27 '18 at 09:30

2 Answers2

6

Use Boolean indexing with df.columns:

res = df.columns[df.isnull().all(0)]

# Index(['c', 'd'], dtype='object')
jpp
  • 159,742
  • 34
  • 281
  • 339
1

@ahbon, you can try df.any(). See the following sequence of statements executed on Python's interactive terminal.

Check http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.any.html

>>> import numpy as np
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'a':[1,2,3],'b':[3,4,5],'c':[np.nan, np.nan, np.nan],'d':[np.nan, np.nan, np.nan]})
>>> df
   a  b   c   d
0  1  3 NaN NaN
1  2  4 NaN NaN
2  3  5 NaN NaN
>>>
>>> # Remove all columns having all NaN values using DataFrame.any()
...
>>> df_new = df.any()
>>> df_new
a     True
b     True
c    False
d    False
dtype: bool
>>>

Finally,

>>> columns = []
>>>
>>> for key, value in df_new.iteritems():
...     if value:
...         columns.append(key)
...
>>> df = pd.DataFrame({'a':[1,2,3],'b':[3,4,5],'c':[np.nan, np.nan, np.nan],'d':[np.nan, np.nan, np.nan]}, columns=columns)
>>>
>>> df
   a  b
0  1  3
1  2  4
2  3  5
>>>
hygull
  • 8,464
  • 2
  • 43
  • 52