0

I'm working on predicting student performance based on various different factors. This is a link to my data: https://archive.ics.uci.edu/ml/datasets/Student+Performance#. This is a sample of the observations from the sex and final grade data columns:

sex G3
F   6
F   6
F   10
F   15
F   10
M   15
M   11
F   6
M   19
M   15
F   9
F   12
M   14

I'm looking at the distribution of my target variable (final grade):

ax= sns.kdeplot(data=df2, x="G3", shade=True)
ax.set(xlabel= 'Final Grade', ylabel= 'Density', title= 'Distribution of Final Grade')
plt.xlim([0, 20])
plt.show()

Screenshot of Distribution of Final Grade

enter image description here

And now I want to find out how the distribution of final grades differ by sex: How can I do this?

Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33
sophnas
  • 3
  • 2
  • Please provide a minimal reproducible example following an [appropriate guide](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). Also, have you looked at the [seaborn documentation](https://seaborn.pydata.org/generated/seaborn.barplot.html)? – xicocaio Mar 30 '21 at 20:44

1 Answers1

0

Considering the sample data.

df2 = pd.DataFrame({'sex': ['F','F','F','F','F','M','M','F','M','M','F','F','M'], 'grades': [6,6,10,15,10,15,11,6,19,15,9,12,14]})
sex G3
F   6
F   6
F   10
F   15
F   10
M   15
M   11
F   6
M   19
M   15
F   9
F   12
M   14

We use the seaborn countplot function as follows.

sns.countplot(x="grades", hue='sex', data=df2)

To get the following plot.

enter image description here

xicocaio
  • 867
  • 1
  • 10
  • 27