Does the quantile() function in Pandas ignore NaN?

Question

I have a dfAB

import pandas as pd
import random

A = [ random.randint(0,100) for i in range(10) ]
B = [ random.randint(0,100) for i in range(10) ]

dfAB = pd.DataFrame({ 'A': A, 'B': B })
dfAB

We can take the quantile function, because I want to know the 75th percentile of the columns:

dfAB.quantile(0.75)

But say now I put some NaNs in the dfAB and re-do the function, obviously its differnt:

dfAB.loc[5:8]=np.nan
dfAB.quantile(0.75)

Basically, when I calculated the mean of the dfAB, I passed skipna to ignore Na's as I didn't want them affecting my stats (I have quite a few in my code, on purpose, and obv making them zero doesn't help)

dfAB.mean(skipna=True)

Thus, what im getting at is whether/how the quantile function addresses NaN's?

If you not pass skipna=True , in mean , if it have nan , it will return nan — BENY, Sep 04 '18 at 17:34
Don't ask us; we're biological units. Try it and see what happens. Load a df with half `NaN` values and play around for a few minutes. — Prune, Sep 04 '18 at 17:34
side comment on the way you generate A, B. you can just A = np.random.randint(100, size=10) — Trenton McKinney, Sep 04 '18 at 17:40
Docs didn't have a reference to skipnan for quantile function, that's why I asked.. DataFrame.quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear') @sacul kindly highlighted the correct comparator, which I didn't know existed, in np.nanpercentile Thanks all — Junaid Mohammad, Sep 04 '18 at 17:49

score 17 · Accepted Answer · answered Sep 04 '18 at 17:38

Yes, this appears to be the way that pd.quantile deals with NaN values. To illustrate, you can compare the results to np.nanpercentile, which explicitely Computes the qth percentile of the data along the specified axis, while ignoring nan values (quoted from the docs, my emphasis):

>>> dfAB
      A     B
0   5.0  10.0
1  43.0  67.0
2  86.0   2.0
3  61.0  83.0
4   2.0  27.0
5   NaN   NaN
6   NaN   NaN
7   NaN   NaN
8   NaN   NaN
9  27.0  70.0

>>> dfAB.quantile(0.75)
A    56.50
B    69.25
Name: 0.75, dtype: float64

>>> np.nanpercentile(dfAB, 75, axis=0)
array([56.5 , 69.25])

And see that they are equivalent

For Pandas v2.0 and up the default for numeric_only is False. See [docs](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.quantile.html). I expect this will change the output of the answer here. — Frank_Coumans, Jul 24 '23 at 15:38

Chong Onn Keat · Answer 2 · 2021-11-17T12:07:19.010

Yes. pd.quantile() will ignore NaN values when calculating the quantile.

To prove this, we can compare it with np.nanquantile, which compute the qth quantile of the data along the specified axis, while ignoring nan values[source] .

>>> random.seed(7)
>>> A = [ random.randint(0,100) for i in range(10) ]
>>> B = [ random.randint(0,100) for i in range(10) ]
>>> dfAB = pd.DataFrame({'A': A, 'B': B})
>>> dfAB.loc[5:8]=np.nan

>>> dfAB
      A     B
0  41.0   7.0
1  19.0  64.0
2  50.0  27.0
3  83.0   4.0
4   6.0  11.0
5   NaN   NaN
6   NaN   NaN
7   NaN   NaN
8   NaN   NaN
9  74.0  11.0

>>> dfAB.quantile(0.75)
A    68.0
B    23.0
Name: 0.75, dtype: float64

>>> np.nanquantile(dfAB, 0.75, axis=0)
array([68.  23.])

Does the quantile() function in Pandas ignore NaN?

2 Answers2