0

I have the dataset in the below format enter image description here

Im trying to generate a histogram for the below question

Generate a histogram for the Age feature of all medal winners in the top five sports in 2016

Below is my code for the same

df4= df
df4= df4.loc[df['Year'] == 2016]
filter_Age= df4.groupby(['Age','Sport'])['Medal'].count().sort_values(ascending= True)

filter_Age.hist()

the output of the same is enter image description here

but the expected output is this: enter image description here

I dont know where im going wrong! Any help is appreciated

the_new_guy
  • 147
  • 1
  • 5
  • 17
  • For debugging help, you need to make a [mre] including data as text, [not a picture](https://meta.stackoverflow.com/q/285551/4518341), and ask a specific question about it. You'll want to look at the intermediate values to see where they diverge from your expectations. To start, if the year selection works properly, you can ignore all the other data. Then you can check if the groupby is correct, if the count is correct, and if the sort is correct. You might even solve the problem yourself by working through it step by step. – wjandrea Feb 12 '23 at 23:38
  • More tips: [How to debug small programs](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/) by Eric Lippert, [How to ask a good question](/help/how-to-ask), [Why is "Can someone help me?" not an actual question?](https://meta.stackoverflow.com/q/284236/4518341), and [How to make good reproducible pandas examples](/q/20109391/4518341) – wjandrea Feb 12 '23 at 23:39

2 Answers2

0

The problem is hist count and binned the same column as it was called. What you need is a bar plot. Try something like

(
    df4.groupby(["Age", "Sport"], as_index=False)
    ["Medal"].count()
    .pivot(index="Age", columns="Sport", values="Medal")
    .plot.bar()
)
plt.show()

For plotting there are a few options. Try playing with stacked=True, or subplots=True

Wakeme UpNow
  • 523
  • 1
  • 4
  • 22
0

you can use pivot_table and distplot or barplot:

filter_Age= df4.pivot_table(index=['Age','Sport'], aggfunc={'Medal':count})
filter_Age= filter_Age.reset_index()

sns.distplot(filter_Age['Age'])