1

I have a df, which has many missing values.

I would like to find the variables/columns which has missing values.

Tried below code:

vars_with_na = [var for var in df.columns if df[var].isnull().sum() > 0]

But it is giving below error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Danish
  • 2,719
  • 17
  • 32
  • Have you tried what the error message suggests? That is: `vars_with_na = [var for var in df.columns if df[var].isnull().any()]` – Arne Jun 17 '21 at 11:35

1 Answers1

2

Here error means there are some duplicated columns names, so df[var] return DataFrame, not Series.

df = pd.DataFrame ({'a':[np.nan, 1],'b':[1, 1],'c':[np.nan, np.nan]})

df.columns = ['a','a','s']
print (df['a'])
     a  a
0  NaN  1
1  1.0  1

vars_with_na = [var for var in df.columns if df[var].isnull().sum() > 0]
print (vars_with_na)

Possible solution is deduplicated them first.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252