I am trying to delete Columns that have more than 3 or k consecutive NaNs. New to pandas. Any help is appreciated.
Data looks like
200 2000 7632
123 NaN 1232
98 NaN 12324
4231 NaN 673
87 76 1000
I am trying to delete Columns that have more than 3 or k consecutive NaNs. New to pandas. Any help is appreciated.
Data looks like
200 2000 7632
123 NaN 1232
98 NaN 12324
4231 NaN 673
87 76 1000
Maybe not the most efficient solution, but easy to implement using more-itertools
: for each column try to locate
the first tuple of 3 NaN
s, if found add this column to list of columns to drop.
import pandas as pd
import more_itertools as mit
df = pd.DataFrame({'col1': [1,2,3,4], 'col2': [None,None,5,None], 'col3': [6,None,None,None]})
to_drop = []
for c in df:
try:
next(mit.locate(df[c].isna(), lambda *x: all(x) == True, 3))
to_drop.append(c)
except:
pass
df = df.drop(to_drop, 1)
print(df)
Result:
col1 col2
0 1 NaN
1 2 NaN
2 3 5.0
3 4 NaN
You can do something like this:
df=pd.DataFrame()
df['col1']=[np.nan,1,2,np.nan,3,np.nan,np.nan]
df['col2']=[np.nan,np.nan,np.nan,np.nan,1,2,3]
df['col3']=[1,2,3,4,np.nan,np.nan,np.nan]
print(df)
col1 col2 col3
0 NaN NaN 1.0
1 1.0 NaN 2.0
2 2.0 NaN 3.0
3 NaN NaN 4.0
4 3.0 1.0 NaN
5 NaN 2.0 NaN
6 NaN 3.0 NaN
df_filtered=df.loc[:,(df.notna().cumsum().shift().apply(lambda x: x.value_counts()).fillna(0)<3).all()]
print(df_filtered)
col1
0 NaN
1 1.0
2 2.0
3 NaN
4 3.0
5 NaN
6 NaN
Note: this eliminates if it has 3 or more, to eliminate from 4, you must replace the 3 with 4
You can use this simple example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'col1':[1,2,3,4], 'col2':[None,None,None,5], 'col3':[6, None, None, 5] })
df
:
col1 col2 col3
0 1 NaN 6.0
1 2 NaN NaN
2 3 NaN NaN
3 4 5.0 5.0
EDIT
Drop consecutive NaNs:
bad_cols=[]
for col in list(df):
for i in range(df.shape[0]-2):
w = df.loc[i:i+2, col]
if np.sum(w.isna()) == 3:
bad_cols.append(col)
break
df.drop(bad_cols, axis=1, inplace=True)
df
:
col1 col3
0 1 6.0
1 2 NaN
2 3 NaN
3 4 5.0