I have a dataframe with a shape (42538, 145)
in which there are over 50 columns which have a NaN
values for all the rows.
I would like to drop these columns without specifying each and every column name in df.drop
.
I have a dataframe with a shape (42538, 145)
in which there are over 50 columns which have a NaN
values for all the rows.
I would like to drop these columns without specifying each and every column name in df.drop
.
You probably want to start with df.notnull
to get the locations of all the non-NaNs.
You can then use df.any
on the result, with axis
set to zero to check all the columns for not-all-NaNness.
The resulting boolean series can be used to index your columns: Pandas Select DataFrame columns using boolean. There are a couple of different options:
df = df.iloc[:, df.notnull().any(axis=0).values]
sel = df.notnull().any(axis=0)
df = df[sel.index[sel]]
You can use pd.DataFrame.dropna
over axis=1
:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [np.nan]*3,
'C': [4, 5, 6], 'D': [np.nan]*3})
df_new = df.dropna(axis=1)
print(df_new)
A C
0 1 4
1 2 5
2 3 6
Try this:
tmp_col = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]
df1.columns = tmp_col
df2 = df1[(df1[2] == 'RO En') | (df1[2] == 'RO En Adj')]
df2[['bp1','bp2']] = df2[6].str.split('-',expand=True)
df2[['mn1','mn2']] = df2[11].str.split(' ',expand=True)
df2['FN'] = df2[10] + ' ' + df2[11]
df2.loc[df2[2] == 'RO ', 'RT'] = ''
df2.loc[df2[2] == 'RO ', ''] = ''
df2.loc[df2[2] == 'RO ', ''] = df2['bp1']
df2.loc[df2[2] == 'RO ', ''] = df2[12]
df3 = df2[df2[8].str.contains('')]
df4 = df2[df2[8].str.contains(')]
print(df3,df4)
pm_col = []
df3 = df3[[10,11,'BP',16,15,15,17,15,'RT',14,21,19,'FN','mn2']]
df3.columns = pm_col
df4 = df4[[10,11,'BP',16,15,15,17,15,'RT',14,21,19,'FN','mn2']]
df4.columns = pm_col