0

I have a dataframe of 100 rows of floats ranging from 0.000001 to 0.001986 that I wish to plot on a seaborn histplot, separated by class. I started with,

sns.histplot(data=df, x='score', hue='test_result', kde=True, color='red', 
             stat='probability', multiple='layer')
plt.show()

However, my bins were overlapping significantly. I added,

binwidth=0.000000001

To the histplot to scale the bins to scientific notation, but this code took over 2 hours to run.

My question is; is there a more computationally efficient way to do this conversion? I need to run the same code for multiple dataframes of similar size. If not, is there a better way to improve the readability of the x-axis bins instead of using scientific notation? Thanks!

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Alice
  • 588
  • 1
  • 8
  • 25
  • 1
    @TrentonMcKinney Thank you for your comment. I may have used `sns.histplot` incorrectly, but that does not answer my question after refactoring my code to your comment. My question was not how I can plot multiple histograms, but how I can convert my x-axis bins to use scientific notation as they are overlapping. My solution using `binwidth` was inefficient. I've since found an alternative method using `plt.ticklabel_format(axis="x", style="sci", scilimits=(0,0))` which solves my problem, converting the bins to scientific notation with no computational overhead. – Alice Aug 19 '23 at 18:06
  • I misunderstood the question. There was not a complete [mre] to see what was happening. A complete [mre] with reproducible data is required. The duplicate has been updated to the correct question(s) and will remain closed. – Trenton McKinney Aug 19 '23 at 18:10
  • 1
    @TrentonMcKinney Thank you. I've updated the question to use working code whilst still preserving the initial problem, and renamed the question more coherently. – Alice Aug 19 '23 at 18:14
  • `bin_width=0.000000001` would make almost 2 million bins, which matplotlib tries to show, but clearly is too much to fit into a plot. Maybe you want to set `log_scale=True`? See e.g. [log scale in seaborn histogram](https://stackoverflow.com/questions/69573823/log-scale-true-in-seaborn-histplot). Would it be possible to edit your post and create a fully reproducible example of the plot you are getting? Maybe you can generate test data via a numpy random function? Maybe you could also add an image of the plot you originally obtained? – JohanC Aug 19 '23 at 23:03

1 Answers1

0

Since this question has been reopened, I'll provide my answer below.

sns.histplot(data=df, x='score', hue='test_result', kde=True, 
             color='red', stat='probability', multiple='layer')
plt.ticklabel_format(axis='x', style='sci', scilimits=(-4,-4))
plt.show()

My understanding is that here I represent the bins on the x-axis of my histplot with scientific e notation, rather than trying to force scientific notation conversion by setting such a binwidth as 0.000000001.

It's worth noting (if a comment may be able to provide an explanation) that with a similar use case, a colleague of mine had some old code on an older version of seaborn/matplotlib that worked using the binwidth method. How?

For those with the same overlapping bins issue, with my data this converted the scale to multiples of 5 (1e-4), fixing said issue and that mentioned by JohanC in comment. From the matplotlib documentation regarding scilimits:

Use (0, 0) to include all numbers. Use (m, m) where m != 0 to fix the order of magnitude to 10m. The formatter default is rcParams["axes.formatter.limits"] (default: [-5, 6]).

Alice
  • 588
  • 1
  • 8
  • 25