0

I have a dataframe that has some null values in different columns. I want to create a list from that dataframe 'data' where I can see only the columns that have non null values. Also I have created a list of missing_row_counts that include the number of null values that each column has. Here is the code that I have.

def non_zeros(series):
    """Returns a list of the index values in series for which
    the value is greater than 0.
    """ 
    for i in missing_row_counts:
      nonzero_row = i > 0 #need to fix 
    return nonzero_row

the code above runs but when I call it with: missing_cols = non_zeros(missing_row_counts) missing_cols it returns True where I am expecting a list of columns that have values all their columns

Tenoch
  • 15
  • 3
  • check this out they have discuussed the same issue as you https://stackoverflow.com/questions/47414848/pandas-select-all-columns-without-nan – Sachin Rajput Dec 11 '20 at 18:15

3 Answers3

0
non_zeros = list(series[series > 0].index.values)
Laggs
  • 386
  • 1
  • 5
0

You can do it with vectorization like this:

import pandas as pd
import numpy as np
data = pd.DataFrame({"a": [1,2], "b": [3, np.NaN]})
non_nan_columns = data.columns[data.isnull().sum(axis=1) == 0]

PLease supply a working example in your next question ;)

0

nonzero_row = i > 0 will always return True or False, because you of comparison which you are making.

However, easier way to do this would be to df.isna().any(), illustrative example below:

df = pd.DataFrame({'a': [1,2,3], 'b':[np.nan, 2,3], 'c': [1,2,3]})
col_has_na = df.isna().any() #this checks if `any` column of df has na
print(col_has_na)
a    False
b     True
c    False
#In above output `a` and `c` does not have na, hence False, whereas its True for `b`

#Fiter out and get index of columns which have False value
    print(col_has_na[~col_has_na].index.tolist())
    ['a', 'c']
Pawan
  • 1,066
  • 1
  • 10
  • 16