0

I have the following piece of code:

from pyspark.sql import DataFrame

import plotly.express as px
import matplotlib.pyplot as plt
    
dfPy = sqlContext.table("df")

pd = dfPy.toPandas()

pd[['col4']].plot(kind='hist', bins=[0,10,20,30,40,50,60,70,80,90,100], rwidth=0.8)
plt.show()

And I get to see the following result of running it in the Apache Zeppelin notebook:

Bell Curve

As it can be seen that I have two issues:

  1. How can I draw a bell curve? Seems the distribution is not normal or gaussian like. So I suppose that I should do some data transformation. Correct?

  2. How can I now draw a bell curve on the resulting histogram?

joesan
  • 13,963
  • 27
  • 95
  • 232
  • I'm not an expert in this area so I don't know if the answers I'm about to present will help you. You can find the answers [here](https://stackoverflow.com/questions/27115531/python-visualize-a-normal-curve-on-datas-histogram). – r-beginners Nov 16 '21 at 13:47
  • I came across that post and I might give it a try. – joesan Nov 16 '21 at 14:00

0 Answers0