2

I want to compare two different distributions, where one has 100 data points, the other 150 data points.

In seaborn I am able to do it using lmplot in this way:

import pandas as pd
import seaborn as sns

df = pd.DataFrame(data)
sns.lmplot(x="dist1", y="dist2", data=df)

considering the input pandas DataFrame as composed by two columns dist1 and dist2, each one having the same number of data points.

However, this only works with distribution of the same size. Therefore I was thinking about taking percentiles of each distribution. Is there already an implementation of such plot (e.g. in matplotlib, seaborn, statsmodels, plotly..)?

Edit

About closing votes: this question does not belong to CrossValidated SE because I am clearly asking about code or libraries API to compare two distributions, not theoretical questions about distributions or statistical methodologies to analyse them. Here for distribution I only meant: set of data points.

Community
  • 1
  • 1
gc5
  • 9,468
  • 24
  • 90
  • 151

1 Answers1

1

Assuming that want the two data sets on the same axis, see this. You need a reference to the axis to which you want to draw.

sample:

a = [1.1, 2.8, 14, 21, 23]
b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

fig, ax1 = plt.subplots()
ax1.scatter(range(len(a)), a)
ax1.scatter(range(len(b)), b)
Community
  • 1
  • 1
parsethis
  • 7,998
  • 3
  • 29
  • 31