0

Hi and thank you for visiting my post.

Here is working code that produces the median values

Wall_Median = pd.pivot_table(cleaned_pokedex, values="Wall", index ='Primary Type',aggfunc={"Wall": np.median})                            
Final_Wall_Median = Wall_Median.nlargest(18,'Wall')
print(Final_Wall_Median)

E.g Poison is 193 and the bar chart shows over 200

1. Wall Primary Type        
Steel         259.0
Fairy         244.0
Dragon        237.0
Rock          235.5
Ground        235.0
Ice           230.0
Flying        220.0
Fighting      216.0
Ghost         215.0
Psychic       215.0
Grass         209.5
Water         208.0
Fire          204.0
Electric      201.0
Dark          200.0
Normal        194.0
Poison        193.0
Bug           180.0

Plotting the values using a seaborn bar chart does not produce the numeric value I receive from the code

fig = plt.gcf()
fig.set_size_inches(20,18)

ax = sns.barplot(x= cleaned_pokedex["Wall"],y= cleaned_pokedex["Primary Type"],data= Final_Wall_Median,palette = pkmn_type_colors)

Outputenter image description here

The bar values don't represent the medians printed. What can I do to fix this ?

Diziet Asahi
  • 38,379
  • 7
  • 60
  • 75
  • Please do not paste code as images. – DYZ Oct 22 '20 at 19:38
  • Hi DYZ, Is this a moderation for my post or was this your input as it prevents you from answering this question above ? – AlignedMind Oct 22 '20 at 20:06
  • It's in the FAQ: [Please do not upload images of code/errors when asking a question.](//meta.stackoverflow.com/q/285551). Also important for this question: [How do I create a minimal complete verifiable example?](https://meta.stackoverflow.com/questions/349789/how-do-i-create-a-minimal-complete-verifiable-example) – JohanC Oct 22 '20 at 20:41
  • Thank you so much, I will edit and hopefully receive some feedback. The images I used to detail the discrepancy between the median value and the represent median. – AlignedMind Oct 22 '20 at 22:26

2 Answers2

1

It seems that you are actually plotting the mean with a CI band instead of the median as you intend to. That is because there is a small contradiction in your code:

ax = sns.barplot(x= cleaned_pokedex["Wall"],y= cleaned_pokedex["Primary Type"],data= Final_Wall_Median,palette = pkmn_type_colors)
  1. you are telling seaborn to get the x and y values from cleaned_pokedex dataframe,
  2. however, then you tell it to use data from the Final_Wall_Median dataframe.

So it seems that seaborn is arbitrarily choosing to use your y~x provided data, instead of the pre-aggregated Final_Wall_Median that you pass into data. Typically, you would use only x and y attributes if you just want to pass any two arrays (they don't need to be from the same dataframe), OR you can profile data as the dataframe you can't to use, and x and y as string column names (e.g. (x="Wall", y="Primary Type", data=cleaned_pokedex))

However, as pointed out, if you simply pass the "Wall", "Primary Type" dimensions into the x and y values of a barplot, seaborn will by default use the "mean" as the estimator.

The two options you have are:


sns.barplot(x=cleaned_pokedex["Wall"], y=cleaned_pokedex["Primary Type"], estimator=np.median)

# or

sns.barplot(x=Final_Wall_Median.Wall, y=Final_Wall_Median.index)

Since you've already pre-aggregated the medians, you can use Final_Wall_Median directly. The only difference is that you cannot get CI bands if you don't supply the raw data (the whole cleaned_pokedex dataframe, as in the first option).

tania
  • 2,104
  • 10
  • 18
0

barplot() takes a parameter estimator= that defines how the bar height is calculated. By default, this is done using mean(), but you can pass median if that's what you want:

ax = sns.barplot(..., estimator=np.median)

Diziet Asahi
  • 38,379
  • 7
  • 60
  • 75
  • Thank you @diziet, I really do appreciate your explanation. I was under the impression that the variable `Wall_Median` stored the median due to `aggfunc={"Wall": np.median})`. I did not know that I would need to explicitly state the bars to plot the median. – AlignedMind Oct 23 '20 at 14:19
  • You'll have to provide a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) that includes a toy dataset (refer to [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)). The fact that the plot shows errorbars around each bars suggest that there are several values for each categories in your dataframe, which are averaged together. – Diziet Asahi Oct 23 '20 at 14:39
  • I would just like some more clarity on this. Are you inferencing that the plot that I displayed is showing an aggregated mean of the median values ? The error bars suggest that they are are multiple values for the median ? – AlignedMind Oct 23 '20 at 21:28