How to remove the NaN from the output?

Question

diamonds = sns.load_dataset("diamonds")
diamonds.head()   

ideal_good = diamonds[(diamonds["cut"]=="Ideal") | 
                      (diamonds["cut"]=="Good")]
ideal_good.groupby("cut")["price"].mean()

   carat      cut color clarity  depth  table  price     x     y     z
0   0.23    Ideal     E     SI2   61.5   55.0    326  3.95  3.98  2.43
1   0.21  Premium     E     SI1   59.8   61.0    326  3.89  3.84  2.31
2   0.23     Good     E     VS1   56.9   65.0    327  4.05  4.07  2.31
3   0.29  Premium     I     VS2   62.4   58.0    334  4.20  4.23  2.63
4   0.31     Good     J     SI2   63.3   58.0    335  4.34  4.35  2.75

cut
Ideal        3457.541970
Premium              NaN
Very Good            NaN
Good         3928.864452
Fair                 NaN
Name: price, dtype: float64

Why am I seeing Premium, Very Good and Fair even though I filtered them out? How do I remove those categories from the output?

[How to create a Minimal, Reproducible Question](https://stackoverflow.com/help/minimal-reproducible-example) You should not just request a ready solution. SO is for helping solve specific errors, after you've shown your effort solving them — Ron, Jan 17 '23 at 00:35
Not sure I am following. I created this code and I am not able to move beyond it as an R user. In R, this is very easy to do: diamonds %>% filter(cut == "Ideal"|cut=="Good") %>% distinct(cut) — Rizzle, Jan 17 '23 at 00:40
provide as small sample diamonds dataframe so we can see what is in it? — Galo do Leste, Jan 17 '23 at 00:42
@GalodoLeste Done. I don't want a code, I just want to know why this is happening. R does not give these elementary issues. — Rizzle, Jan 17 '23 at 00:47
I get that, but generally the problem is that code is incorrect. I tried your code and it seems to work fine for me. Can you show the print statement you used to get the output — Galo do Leste, Jan 17 '23 at 01:03
Hi @GalodoLeste This was what i used to get the output: ideal_good.groupby("cut")["price"].mean() — Rizzle, Jan 17 '23 at 01:06
No. That is what you used to calculate some averages. In order to print out the output which contains the Nan values you must have used some print statement like:```print(ideal_good)```. The output you have there looks like you used the statement:```print(diamonds.groupby("cut")["price"].mean())``` — Galo do Leste, Jan 17 '23 at 01:09
@GalodoLeste OP seems to be using Jupyter, which is a REPL (IPython). Explicitly printing is not necessary in a REPL. — wjandrea, Jan 17 '23 at 01:11
Just saved it as an object but the output as I suspected would be the same. So @wjandrea is correct. — Rizzle, Jan 17 '23 at 01:13
Ok, my apologies. However, my statement still stands. The output shown looks like it is ```diamonds.groupby("cut")["price"].mean()``` not ```ideal_good.groupby("cut")["price"].mean()``` — Galo do Leste, Jan 17 '23 at 01:14
It looks like it's because `diamonds["cut"]` is [categorical](https://pandas.pydata.org/docs/user_guide/categorical.html). I don't know categoricals very well myself, but FWIW, you can get the output you want by doing `...mean()[["Ideal", "Good"]]`, but that seems clunky. — wjandrea, Jan 17 '23 at 01:14
Beside the point, but you could shorten the selection: `diamonds[diamonds["cut"].isin(['Ideal', 'Good'])]` — wjandrea, Jan 17 '23 at 01:16
Alternatively then you could add the dopna() to the output dataframe after calculating means() — Galo do Leste, Jan 17 '23 at 01:18
Thank you for the attempts. I figured it out. I could have added a condition in group by: ideal_good.groupby("cut", observed =True)["price"].mean() — Rizzle, Jan 17 '23 at 01:24

How to remove the NaN from the output?

0 Answers0