138

I have this DataFrame:

                      0   1   2         3   4       5   6          7
0               #0915-8 NaN NaN       NaN NaN     NaN NaN        NaN
1                   NaN NaN NaN  LIVE WGT NaN  AMOUNT NaN      TOTAL
2               GBW COD NaN NaN     2,280 NaN   $0.60 NaN  $1,368.00
3               POLLOCK NaN NaN     1,611 NaN   $0.01 NaN     $16.11
4                 WHAKE NaN NaN       441 NaN   $0.70 NaN    $308.70
5           GBE HADDOCK NaN NaN     2,788 NaN   $0.01 NaN     $27.88
6           GBW HADDOCK NaN NaN    16,667 NaN   $0.01 NaN    $166.67
7               REDFISH NaN NaN       932 NaN   $0.01 NaN      $9.32
8    GB WINTER FLOUNDER NaN NaN       145 NaN   $0.25 NaN     $36.25
9   GOM WINTER FLOUNDER NaN NaN    25,070 NaN   $0.35 NaN  $8,774.50
10        GB YELLOWTAIL NaN NaN        26 NaN   $1.75 NaN     $45.50

I want to drop all NaNs as well as any columns with more than 3 NaNs (either one, or both, should work I think). I tried this code:

fish_frame.dropna()
fish_frame.dropna(thresh=len(fish_frame) - 3, axis=1)

but it seems not to have any effect on the DataFrame - I see the same results afterward.

What is wrong with the code, and how do I fix it?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
theprowler
  • 3,138
  • 11
  • 28
  • 39
  • 8
    `.dropna()` doesn't change DF in place - it returns a changed DF... so you either have to assign it back like: `df = df.dropna()` or to explicitly use `inplace=True` parameter – MaxU - stand with Ukraine Jul 17 '17 at 14:40
  • Ohh my bad. Gotcha. Should I expect that command to produce an empty dataframe, given how many `NaN`s my original one has? – theprowler Jul 17 '17 at 14:51
  • 1
    i think your second command should work (since it targets columns), but the first one will remove any row with a `NaN` - since all rows have at least one NaN in them, it will remove all of them. – Corley Brigman Jul 17 '17 at 15:01
  • @MaxU: better to say [`dropna()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html) **by default does `inplace=False`** so you'd need to assign that; but if you want in-place just do `dropna(..., inplace=True)` – smci Sep 09 '19 at 23:34
  • 1
    OP When you say *"drop all NaNs"* you really mean *"drop all-NaN columns"*. That's slightly different. – smci Sep 09 '19 at 23:37
  • Near-duplicate but older 2012 question [Remove NaN/NULL columns in a Pandas dataframe?](https://stackoverflow.com/questions/10857924/remove-nan-null-columns-in-a-pandas-dataframe). Sadly we can't close that into this. Also from 2015 [Select rows from pandas data frame where specified columns are not all NaN](https://stackoverflow.com/questions/34311957/select-rows-from-pandas-data-frame-where-specified-columns-are-not-all-nan) – smci Sep 09 '19 at 23:51

4 Answers4

253

From the dropna docstring:

Drop the columns where all elements are NaN:
df.dropna(axis=1, how='all')


   A    B    D
0  NaN  2.0  0
1  3.0  4.0  1
2  NaN  NaN  5
DavideBrex
  • 2,374
  • 1
  • 10
  • 23
Corley Brigman
  • 11,633
  • 5
  • 33
  • 40
  • (upvoted). If you want to preserve the new dataframe without empty columns, use the "inplace = True" option. Eg.: df.dropna(axis=1, how='all', inplace=True) – Alex Mar 10 '23 at 16:22
  • @Alex `inplace` is likely going to be deprecated and should be avoided. https://github.com/pandas-dev/pandas/issues/16529 and https://towardsdatascience.com/why-you-should-probably-never-use-pandas-inplace-true-9f9f211849e4 – fantabolous May 05 '23 at 05:27
16

dropna() drops the null values and returns a dataFrame. Assign it back to the original dataFrame.

fish_frame = fish_frame.dropna(axis = 1, how = 'all')

Referring to your code:

fish_frame.dropna(thresh=len(fish_frame) - 3, axis=1)

This would drop columns with 7 or more NaN's (assuming len(df) = 10), if you want to drop columns with more than 3 Nan's like you've mentioned, thresh should be equal to 3.

Rakesh Adhikesavan
  • 11,966
  • 18
  • 51
  • 76
8

dropna() by default returns a dataframe (defaults to inplace=False behavior) and thus needs to be assigned to a new dataframe for it to stay in your code.

So for example,

fish_frame = fish_frame.dropna()

As to why your dropna is returning an empty dataframe, I'd recommend you look at the "how" argument in the dropna method (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html). Also bear in mind, axis=0 corresponds to columns, and axis=1 corresponds to rows.

So to remove columns with all "NAs", axis=0, how="any" should do the trick:

fish_frame = fish_frame.dropna(axis=0, how="any")

Finally, the "thresh" argument designates explicitly how many NA's are necessary for a drop to occur. So

fish_frame = fish_frame.dropna(axis=0, thresh=3, how="any") 

should work fine and dandy to remove any column with three NA's.

Also, as Corley pointed out, how="any" is the default and is thus not necessary.

smci
  • 32,567
  • 20
  • 113
  • 146
SeeDerekEngineer
  • 770
  • 2
  • 6
  • 22
8

Another solution would be to create a boolean dataframe with True values at not-null positions and then take the columns having at least one True value. Below line removes columns with all NaN values.

df = df.loc[:,df.notna().any(axis=0)]

If you want to remove columns having at least one missing (NaN) value;

df = df.loc[:,df.notna().all(axis=0)]

This approach is particularly useful in removing columns containing empty strings, zeros or basically any given value. For example;

df = df.loc[:,(df!='').all(axis=0)]

removes columns having at least one empty string.

Achintha Ihalage
  • 2,310
  • 4
  • 20
  • 33