0

I have a data set that contains 440 variables, and it is divided into three columns:

  • Column 1 is called Simulations; this variable contains the names of four different simulations (indiv, ssm, bma, and real) to calculate some indicators. This variable is an object.
  • Column 2 is called Scores and contains the values assigned by each simulation to each observation of the data set; the scores go from 1 to 4. The variable is a float64.
  • Column three is called Ranking and contains who the observations are ranked according to their scores. The variable is an object

I am trying to combine a histogram of the hole population with the KDE plot of the variable real. So far this is my code:

import seaborn as sns
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
sns.__version__
ors=pd.read_excel('C:\Data\Book1.xlsx')
ors.shape
indiv=ors[ors.Simulation=='Individual weights']
subset1= ors[ors['Simulation'] == 'indiv']
ssm=ors[ors.Simulation=='ssm']
subset2= ors[ors['Simulation'] == 'ssm']
bma=ors[ors.Simulation=='bma']
subset3= ors[ors['Simulation'] == 'bma']
real=ors[ors.Simulation=='real']
subset4= ors[ors['Simulation'] == 'real']
sns.set_style('white')
sns.displot(x='Scores', data=ors)

enter image description here This is the resulting histogram of the hole population, then I apply the following code to check for the kde of all the variables:

sns.displot(x='Scores', data=ors, kind='kde', hue='Simulation')

As a result, comes the following graph: enter image description here

Now I am trying to combine the red kde with the histogram of the population of my data set, I was using the following command to do this, although I am not sure if this is the correct way to combine this graphs:

sns.displot(x='Scores', data=ors, hist= True, hist= False, subset4['Scores'], hist = False, kde = True,
                 kde_kws = {'linewidth': 3})

But I get this mistake

File "<ipython-input-36-dc49ac2c4ff6>", line 1
    sns.displot(x='Scores', data=ors, hist= True, hist= False, subset4['Scores'], hist = False, kde = True,
                                                 ^
SyntaxError: keyword argument repeated

Many thanks, Kind regards, Iván

Iván
  • 63
  • 5
  • 1
    This question is not reproducible without **data**. Please see [How to provide a reproducible copy of your DataFrame using `df.head(30).to_clipboard(sep=',')`](https://stackoverflow.com/q/52413246/7758804), then **[edit] your question**, and paste the clipboard into a code block. Always provide a [mre] **with code, data, errors, current output, and expected output, as [formatted text](https://stackoverflow.com/help/formatting)**. If relevant, plot images are okay. – Trenton McKinney Sep 17 '21 at 13:40
  • 1
    displot is a figure-level plot, this will probably be easier to accomplish with histplot and kdeplot, which are axes-level plots. – Trenton McKinney Sep 17 '21 at 13:43
  • Thanks @TrentonMcKinney, actually I was using the displot command but I am not sure how to have one command to combine both graphs. – Iván Sep 17 '21 at 14:47
  • 1
    Since you are trying to plot a subset of data onto the histogram, you need to use histplot and then kdeplot for the subset. These are axes level plots. What you're trying to do is significantly more difficult with displot. – Trenton McKinney Sep 17 '21 at 14:51
  • The SyntaxError: you can't have both `hist= True, hist= False,` – Trenton McKinney Sep 17 '21 at 14:53

1 Answers1

0

I applied the following code:

sns.displot(x='Scores', data=ors, bins=10, stat='density')
sns.distplot(subset4['Scores'], hist = False, kde = True, color='red')

The outcome looks pretty well.

Iván
  • 63
  • 5