0

Suppose I construct the following dataframe:

a = pd.DataFrame({'a':np.arange(5)})
b = pd.DataFrame({'b':np.arange(4)})
c = pd.DataFrame({'c':np.arange(5)})
d = pd.DataFrame({'d':np.arange(7)})
df = pd.concat([a,b,c,d,],ignore_index=False, axis=1)

This produce the following dataframe:

df
Out[86]: 
     a    b    c  d
0  0.0  0.0  0.0  0
1  1.0  1.0  1.0  1
2  2.0  2.0  2.0  2
3  3.0  3.0  3.0  3
4  4.0  NaN  4.0  4
5  NaN  NaN  NaN  5
6  NaN  NaN  NaN  6

How can I remove all columns that have a length of exactly 5 numerical elements without using dropna?

The output will be:

df
Out[88]: 
     a      c  
0  0.0    0.0  
1  1.0    1.0  
2  2.0    2.0  
3  3.0    3.0  
4  4.0    4.0  
Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
Jonathan Pacheco
  • 531
  • 1
  • 6
  • 16

2 Answers2

2

This checks whether or not each value in the dataframe is a float or integer, and sums the result by column. It then filters for where this total equals five.

>>> df[df.columns[(df.apply(
        lambda series: [isinstance(val, (float, int)) and not np.isnan(val) 
                        for val in series]).sum() == 5)]]
    a   c
0   0   0
1   1   1
2   2   2
3   3   3
4   4   4
5 NaN NaN
6 NaN NaN
Alexander
  • 105,104
  • 32
  • 201
  • 196
1

You can use the following:

filt = df.count() != 5
df = df.drop(df.columns[filt], axis=1)

This will give you:

     a    c
0  0.0  0.0
1  1.0  1.0
2  2.0  2.0
3  3.0  3.0
4  4.0  4.0
5  NaN  NaN
6  NaN  NaN

Then as for dropping rows 5 and 6, this is really what dropna is designed for (as is your entire question), but if you insist...

filt2 = df.T.isnull().any()
df = df.drop(df.index[filt2])

This assumes your data is numeric. If it includes object dtypes (strings), you' want to run a type check such as in @Alexander's answer.

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235