Drop columns from a dataframe by number of row elements

Question

Suppose I construct the following dataframe:

a = pd.DataFrame({'a':np.arange(5)})
b = pd.DataFrame({'b':np.arange(4)})
c = pd.DataFrame({'c':np.arange(5)})
d = pd.DataFrame({'d':np.arange(7)})
df = pd.concat([a,b,c,d,],ignore_index=False, axis=1)

This produce the following dataframe:

df
Out[86]: 
     a    b    c  d
0  0.0  0.0  0.0  0
1  1.0  1.0  1.0  1
2  2.0  2.0  2.0  2
3  3.0  3.0  3.0  3
4  4.0  NaN  4.0  4
5  NaN  NaN  NaN  5
6  NaN  NaN  NaN  6

How can I remove all columns that have a length of exactly 5 numerical elements without using dropna?

The output will be:

df
Out[88]: 
     a      c  
0  0.0    0.0  
1  1.0    1.0  
2  2.0    2.0  
3  3.0    3.0  
4  4.0    4.0

Because I'm a neophyte in python :( but your answer was exactly what I need! — Jonathan Pacheco, Sep 08 '17 at 18:07

score 2 · Answer 1 · answered Sep 08 '17 at 18:11

2

This checks whether or not each value in the dataframe is a float or integer, and sums the result by column. It then filters for where this total equals five.

>>> df[df.columns[(df.apply(
        lambda series: [isinstance(val, (float, int)) and not np.isnan(val) 
                        for val in series]).sum() == 5)]]
    a   c
0   0   0
1   1   1
2   2   2
3   3   3
4   4   4
5 NaN NaN
6 NaN NaN

answered Sep 08 '17 at 18:11

Alexander

105,104
32
201
196

Great! a very pythonic way to do it – Jonathan Pacheco Sep 08 '17 at 18:15
@Alexander it's cold out here for a pimp, I know – Brad Solomon Sep 08 '17 at 18:49
Can I get you to vote your conscience on my post here? Thanks! https://stackoverflow.com/a/46192213/2336654 – piRSquared Sep 13 '17 at 08:21

Brad Solomon · Accepted Answer · 2017-09-08T18:13:24.233

You can use the following:

filt = df.count() != 5
df = df.drop(df.columns[filt], axis=1)

This will give you:

     a    c
0  0.0  0.0
1  1.0  1.0
2  2.0  2.0
3  3.0  3.0
4  4.0  4.0
5  NaN  NaN
6  NaN  NaN

Then as for dropping rows 5 and 6, this is really what dropna is designed for (as is your entire question), but if you insist...

filt2 = df.T.isnull().any()
df = df.drop(df.index[filt2])

This assumes your data is numeric. If it includes object dtypes (strings), you' want to run a type check such as in @Alexander's answer.

Drop columns from a dataframe by number of row elements

2 Answers2