1

I have found some values which seems all columns are null. The examples are below enter image description here

I want to remove there rows. But when I use the method from the link below, the return dataframe has no rows which should represent the all value null rows. Python Pandas find all rows where all values are NaN

So I want to know what's wrong with my data frame. Is the NA matters ? What should I do to get the null rows' row number?

Besides, I use

df_features.loc[df_features['sexo'].isnull() & (df_features['age']=='NA'),:]

But it returns no rows from my data frame.

Community
  • 1
  • 1
yanachen
  • 3,401
  • 8
  • 32
  • 64

1 Answers1

1

I think you need boolean indexing with mask created by notnull:

df_features[df_features['sexo'].notnull()]

It seems you need:

df_features[(df_features['sexo'].notnull()) & (df_features['age'] != 'NA')]

Sample:

df_features = pd.DataFrame({'sexo':[np.nan,2,3],
                   'age':['10','20','NA']})

print (df_features)
  age  sexo
0  10   NaN
1  20   2.0
2  NA   3.0

a = df_features[(df_features['sexo'].notnull()) & (df_features['age'] != 'NA')]
print (a)
  age  sexo
1  20   2.0

But it seems your colunmns with NA values are not numeric, but string.

If need convert some columns to numeric, try to_numeric, parameter errors='coerce' means convert all values which cannot bye parsed to numeric to NaN:

df_features = pd.DataFrame({'sexo':[np.nan,2,3],
                   'age':['10','20','NA']})

print (df_features)
  age  sexo
0  10   NaN
1  20   2.0
2  NA   3.0

df_features['age'] = pd.to_numeric(df_features['age'], errors='coerce')
print (df_features)
    age  sexo
0  10.0   NaN
1  20.0   2.0
2   NaN   3.0

a = df_features[(df_features['sexo'].notnull()) & (df_features['age'].notnull())]
print (a)
    age  sexo
1  20.0   2.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252