I'm learning some data science related topics and oh boy, this is a jungle of different libraries for everything
Because of things, I went with Lets-plot, which has a nice Kotlin API that I'm using combined with Kotlin kernel for Jupyter notebooks
Overall, things are going pretty good. Most tutorials & docs I see online use different libraries for plotting (e.g. Seaborn, Matplotlib, Plotly) so most of the time I have to do some reading of the Lets-Plot-Kotlin reference and try/error until I find the equivalent code for my graphs
Currently, I'm trying to graph the distribution of differences between two values. Overall, this looks pretty good. I can just do something like
(letsPlot(df)
+ geomHistogram { x = "some-column" }
).show()
It would be interesting to see the density estimator as well, geomDensity
to the rescue!
(letsPlot(df)
+ geomDensity(color = "red") { x = "some-column" }
).show()
Nice! Now let's watch them both together
(letsPlot(df)
+ geomDensity(color = "red") { x = "some-column" }
+ geomHistogram() { x = "some-column" }
).show()
As you can see, there's a small red line in the bottom (the geomDensity
!). Problem here (I would say) is that both layers are using the same Y scale. Histogram is working with 0-20 values and density with 0-0.02 so when plotted together it's just a line at the bottom
Is there any way to add several layers in the same plot that use their own scale? I've read some blogposts that claim that you should not go for it (seems to be pretty much accepted by the community.
My target is to achieve something similar to what you can do with Seaborn by doing
plt.figure(figsize=(10,4),dpi=200)
sns.histplot(data=df,x='some_column',kde=True,bins=25)
(yes I know I took the lets plot screenshot without the bins configured. Not relevant, I'd say ¯_(ツ)_/¯ )
Maybe I'm just approaching the problem with a mindset I should not? As mentioned, I'm still learning so every alternative will be highly welcomed
Just, please, don't go with the "Switch to Python". I'm exploring and I'd prefer to go one topic at a time