I want to compare two different distributions, where one has 100 data points, the other 150 data points.
In seaborn
I am able to do it using lmplot
in this way:
import pandas as pd
import seaborn as sns
df = pd.DataFrame(data)
sns.lmplot(x="dist1", y="dist2", data=df)
considering the input pandas
DataFrame as composed by two columns dist1
and dist2
, each one having the same number of data points.
However, this only works with distribution of the same size. Therefore I was thinking about taking percentiles of each distribution. Is there already an implementation of such plot (e.g. in matplotlib, seaborn, statsmodels, plotly..)?
Edit
About closing votes: this question does not belong to CrossValidated SE because I am clearly asking about code or libraries API to compare two distributions, not theoretical questions about distributions or statistical methodologies to analyse them. Here for distribution I only meant: set of data points.