Remove NaN/NULL columns in a Pandas dataframe?

Question

I have a dataFrame in pandas and several of the columns have all null values. Is there a built in function which will let me remove those columns?

could you maybe accept the answer? This will mark the question as resolved and help other users as well. — MERose, Nov 01 '16 at 09:16

score 120 · Accepted Answer · edited Jul 24 '12 at 14:30

120

Yes, dropna. See http://pandas.pydata.org/pandas-docs/stable/missing_data.html and the DataFrame.dropna docstring:

Definition: DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None)
Docstring:
Return object with labels on given axis omitted where alternately any
or all of the data are missing

Parameters
----------
axis : {0, 1}
how : {'any', 'all'}
    any : if any NA values are present, drop that label
    all : if all values are NA, drop that label
thresh : int, default None
    int value : require that many non-NA values
subset : array-like
    Labels along other axis to consider, e.g. if you are dropping rows
    these would be a list of columns to include

Returns
-------
dropped : DataFrame

The specific command to run would be:

df=df.dropna(axis=1,how='all')

edited Jul 24 '12 at 14:30

SlimJim

2,264
2
22
25

answered Jun 02 '12 at 04:52

Wes McKinney

101,437
32
142
108

1

can you specify the 'dropna' value? for example could you drop rows that are all zeros? – zach Oct 10 '12 at 19:15
7

you could either define with the pandas io parsers that your NaN value in given input tabels is 0, OR, you could prepare your step like this: `df[df==0] = np.nan ; df=df.dropna(axis=1,how='all')` – K.-Michael Aye Dec 11 '12 at 01:50
1

For inplace: `df.dropna(axis=1,how='all',inplace=True)` – brokenfoot Nov 22 '18 at 00:33
I used `df=df.dropna(axis=1,how='all')` but it removed all my df columns. Other columns are not entirely empty. – Jade Cacho Jan 06 '20 at 23:17

score 2 · Answer 2 · answered May 12 '21 at 23:33

Another solution would be to create a boolean dataframe with True values at not-null positions and then take the columns having at least one True value. This removes columns with all NaN values.

df = df.loc[:,df.notna().any(axis=0)]

If you want to remove columns having at least one missing (NaN) value;

df = df.loc[:,df.notna().all(axis=0)]

This approach is particularly useful in removing columns containing empty strings, zeros or basically any given value. For example;

df = df.loc[:,(df!='').all(axis=0)]

removes columns having at least one empty string.

score 0 · Answer 3 · edited Jun 20 '20 at 09:12

Here is a simple function which you can use directly by passing dataframe and threshold

df
'''
     pets   location     owner     id
0     cat  San_Diego     Champ  123.0
1     dog        NaN       Ron    NaN
2     cat        NaN     Brick    NaN
3  monkey        NaN     Champ    NaN
4  monkey        NaN  Veronica    NaN
5     dog        NaN      John    NaN
'''

def rmissingvaluecol(dff,threshold):
    l = []
    l = list(dff.drop(dff.loc[:,list((100*(dff.isnull().sum()/len(dff.index))>=threshold))].columns, 1).columns.values)
    print("# Columns having more than %s percent missing values:"%threshold,(dff.shape[1] - len(l)))
    print("Columns:\n",list(set(list((dff.columns.values))) - set(l)))
    return l


rmissingvaluecol(df,1) #Here threshold is 1% which means we are going to drop columns having more than 1% of missing values

#output
'''
# Columns having more than 1 percent missing values: 2
Columns:
 ['id', 'location']
'''

Now create new dataframe excluding these columns

l = rmissingvaluecol(df,1)
df1 = df[l]

PS: You can change threshold as per your requirement

Bonus step

You can find the percentage of missing values for each column (optional)

def missing(dff):
    print (round((dff.isnull().sum() * 100/ len(dff)),2).sort_values(ascending=False))

missing(df)

#output
'''
id          83.33
location    83.33
owner        0.00
pets         0.00
dtype: float64
'''

This answer is inferior to [`df.dropna(..., thresh)`](https://pandas.pydata.org/pandas-docs/version/0.17/generated/pandas.DataFrame.dropna.html) implements this, we just need to calculate the right value. And you don't need to create any new dataframe, you just do `df.dropna(..., inplace=True)`. — smci, Sep 09 '19 at 23:59

ajay singh · Answer 4 · 2018-06-29T07:33:40.637

-2

Function for removing all null columns from the data frame:

def Remove_Null_Columns(df):
    dff = pd.DataFrame()
    for cl in fbinst:
        if df[cl].isnull().sum() == len(df[cl]):
            pass
        else:
            dff[cl] = df[cl]
    return dff

This function will remove all Null columns from the df.

edited Jun 29 '18 at 07:33

answered Jun 29 '18 at 06:41

ajay singh

1
1

2

Please, if you answer something, atleast use a correct guidestyle like pep8... Also, pandas offers the dropna() function, so this is not a good answer... – Noki Sep 04 '18 at 11:38

Remove NaN/NULL columns in a Pandas dataframe?

4 Answers4

Here is a simple function which you can use directly by passing dataframe and threshold

Bonus step

Linked

Related