0

I want to plot the distribution with violinplot of a set of values between 1 and 800, I have used this code. I am very new to this.

import matplotlib.pyplot as plt
from matplotlib import ticker as mticker
import seaborn as sns
import numpy as np

log_data = [[np.log10(d) for d in row] for row in [data['count']]]
print(log_data)

fig, ax = plt.subplots()
sns.violinplot(data=log_data, ax=ax)

plt.show()

Why do I have three 10^0s?

enter image description here

This is my data: [ 8, 7, 5, 1, 2, 6, 5, 1, 2, 31, 9, 40, 9, 53, 4, 8, 3, 1, 46, 2, 18, 4, 17, 26, 17, 2, 19, 14, 2, 16, 35, 42, 22, 2, 19, 13, 59, 11, 69, 33, 2, 2, 24, 86, 16, 11, 7, 5, 18, 22, 1, 2, 16, 28, 3, 2, 12, 16, 1, 8, 1, 2, 5, 4, 9, 1, 1, 5, 1, 4, 5, 2, 11, 25, 6, 45, 64, 6, 2, 63, 26, 2, 3, 8, 3, 16, 8, 2, 2, 99, 2, 51, 43, 5, 53, 10, 19, 20, 6, 9, 1, 4, 1, 19, 4, 2, 3, 2, 77, 4, 7, 3, 2, 1, 81, 15, 50, 22, 58, 21, 10, 1, 18, 8, 1, 35, 2, 32, 18, 12, 11, 7, 5, 27, 29, 1, 2, 5, 1, 2, 3, 3, 1, 45, 22, 1, 12, 2, 21, 4, 1, 19, 27, 23, 3, 1, 21, 1, 124, 13, 17, 1, 18, 33, 23, 3, 6, 2, 8, 3, 1, 228, 28, 1, 1, 122, 868, 47, 2, 1, 9, 108, 10, 1, 5, 40, 43, 5, 2, 137, 9, 11, 19, 19, 11, 21, 8, 1, 6, 2, 3, 3, 26, 42, 14, 1, 14, 15, 3, 30, 17, 5, 17, 3, 38, 11, 54, 3, 1, 1, 3, 3, 7, 3, 1, 1, 5, 9, 1, 5, 4, 7, 35, 8, 10, 6, 6, 5, 3, 28, 2, 2, 5, 13, 6, 2, 4, 3, 2, 7, 52, 31, 1, 7, 7, 216, 4, 13, 6, 14, 4, 4, 5, 102, 3, 15, 4, 12, 48, 5, 9, 3, 10, 35, 36, 2, 10, 2, 55, 15, 17, 2, 19, 14, 14, 15, 5, 4, 11, 1, 1, 18, 4, 63, 63, 22, 37, 2, 22, 8, 22, 8, 20, 104, 3, 2, 6, 11, 20, 1, 3, 78, 2, 1, 52, 33, 2, 4, 9, 1, 27, 9, 4, 4, 2, 9, 9, 2, 24, 137, 12, 2, 2, 1, 6, 11, 8, 1, 20, 23, 75, 5, 1, 14, 3, 31, 15, 4, 2, 26, 50, 9, 75, 42, 14, 4, 1, 2, 9, 34, 25, 37, 53, 122, 28, 52, 22, 1, 109, 1, 1, 11, 1, 15, 2, 9, 32, 23, 5, 6, 3, 2, 51, 9, 12, 10, 7, 5, 2, 1, 311, 41, 1, 6, 13, 2, 5, 18, 105, 13, 17, 3, 9, 48, 2, 15, 18, 16, 77, 13, 3, 2, 2, 8, 1, 3, 4, 93, 23, 169, 1, 24, 2, 1, 8, 36, 1, 1, 1, 6, 3, 1, 25, 1, 2, 59, 2, 3, 3, 1, 8, 2, 1, 6, 15, 1, 7, 29, 4, 4, 8, 22, 5, 80, 16, 3, 147, 23, 6, 16, 1, 8, 530]

Using the set_yscale

ax.set_yscale('log')

sns.violinplot(data=first_issues_count, ax=ax)

enter image description here

DRA
  • 165
  • 8
  • @JohanC I need them on the log scale. – DRA Apr 02 '22 at 13:32
  • 1
    @DRA, I think JohanC was getting to the point that you are not setting a log scale anywhere in your code. – ramzeek Apr 02 '22 at 13:58
  • Also, is it possible to share your dataset (or a small-ish subset that still creates the plotting error)? Trying to create my own (pulling from a normal distribution and plotting it on a log scale) comes out just fine, so it's hard to figure out what could be going on. – ramzeek Apr 02 '22 at 14:03
  • @ramzeek I have converted the data, also when I use set_yscale my plot doesn't look the way I want, the x-axis cut it, no curve on the bottom. – DRA Apr 02 '22 at 14:03
  • 1
    Without a [MRE](https://stackoverflow.com/help/minimal-reproducible-example) it's hard to try to help. – ramzeek Apr 02 '22 at 14:05
  • @DRA, you can use `ax.set_ylim` to change your y-axis limits. – ramzeek Apr 02 '22 at 14:06
  • @ramzeek I added the data. – DRA Apr 02 '22 at 14:19
  • @JohanC that answer is actually what I used to convert the data. But I get those three 10^0s if I change the range. – DRA Apr 02 '22 at 14:25
  • Your data have `np.log10(1)` which is 0. A log plot never goes to 0 until you reach 10**-infty. That is your problem. – ramzeek Apr 02 '22 at 14:58
  • 1
    Also, I would say one of the great things about a violin plot is it shows the pdf of the data. Plotting on a log scale makes it much harder to interpret visually, and so you are losing the power a violin plot provides over a simpler depiction such as a box plot. – ramzeek Apr 02 '22 at 15:02

1 Answers1

0

A logscale option for the violinplot is on the roadmap for seaborn 0.12. Meanwhile, you can calculate the violinplot using the log10 of the _data and some formatting tricks, similar to Violin Plot troubles in Python on log scale.

The example code below shows how the formatting tricks could be adapted for your situation. For comparison, a sns.boxenplot is added, which doesn't have problems with a real log scale.

import matplotlib.pyplot as plt
from matplotlib.ticker import StrMethodFormatter
import seaborn as sns
import numpy as np

data = np.array([8, 7, 5, 1, 2, 6, 5, 1, 2, 31, 9, 40, 9, 53, 4, 8, 3, 1, 46, 2, 18, 4, 17, 26, 17, 2, 19, 14, 2, 16, 35, 42, 22, 2, 19, 13, 59, 11, 69, 33, 2, 2, 24, 86, 16, 11, 7, 5, 18, 22, 1, 2, 16, 28, 3, 2, 12, 16, 1, 8, 1, 2, 5, 4, 9, 1, 1, 5, 1, 4, 5, 2, 11, 25, 6, 45, 64, 6, 2, 63, 26, 2, 3, 8, 3, 16, 8, 2, 2, 99, 2, 51, 43, 5, 53, 10, 19, 20, 6, 9, 1, 4, 1, 19, 4, 2, 3, 2, 77, 4, 7, 3, 2, 1, 81, 15, 50, 22, 58, 21, 10, 1, 18, 8, 1, 35, 2, 32, 18, 12, 11, 7, 5, 27, 29, 1, 2, 5, 1, 2, 3, 3, 1, 45, 22, 1, 12, 2, 21, 4, 1, 19, 27, 23, 3, 1, 21, 1, 124, 13, 17, 1, 18, 33, 23, 3, 6, 2, 8, 3, 1, 228, 28, 1, 1, 122, 868, 47, 2, 1, 9, 108, 10, 1, 5, 40, 43, 5, 2, 137, 9, 11, 19, 19, 11, 21, 8, 1, 6, 2, 3, 3, 26, 42, 14, 1, 14, 15, 3, 30, 17, 5, 17, 3, 38, 11, 54, 3, 1, 1, 3, 3, 7, 3, 1, 1, 5, 9, 1, 5, 4, 7, 35, 8, 10, 6, 6, 5, 3, 28, 2, 2, 5, 13, 6, 2, 4, 3, 2, 7, 52, 31, 1, 7, 7, 216, 4, 13, 6, 14, 4, 4, 5, 102, 3, 15, 4, 12, 48, 5, 9, 3, 10, 35, 36, 2, 10, 2, 55, 15, 17, 2, 19, 14, 14, 15, 5, 4, 11, 1, 1, 18, 4, 63, 63, 22, 37, 2, 22, 8, 22, 8, 20, 104, 3, 2, 6, 11, 20, 1, 3, 78, 2, 1, 52, 33, 2, 4, 9, 1, 27, 9, 4, 4, 2, 9, 9, 2, 24, 137, 12, 2, 2, 1, 6, 11, 8, 1, 20, 23, 75, 5, 1, 14, 3, 31, 15, 4, 2, 26, 50, 9, 75, 42, 14, 4, 1, 2, 9, 34, 25, 37, 53, 122, 28, 52, 22, 1, 109, 1, 1, 11, 1, 15, 2, 9, 32, 23, 5, 6, 3, 2, 51, 9, 12, 10, 7, 5, 2, 1, 311, 41, 1, 6, 13, 2, 5, 18, 105, 13, 17, 3, 9, 48, 2, 15, 18, 16, 77, 13, 3, 2, 2, 8, 1, 3, 4, 93, 23, 169, 1, 24, 2, 1, 8, 36, 1, 1, 1, 6, 3, 1, 25, 1, 2, 59, 2, 3, 3, 1, 8, 2, 1, 6, 15, 1, 7, 29, 4, 4, 8, 22, 5, 80, 16, 3, 147, 23, 6, 16, 1, 8, 530])

sns.set_style('ticks')
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(16, 8))

sns.violinplot(y=np.log10(data), ax=ax1)
major_ticks = np.arange(np.floor(np.log10(data).min()), np.log10(data).max() + 1)
ax1.yaxis.set_ticks(major_ticks, minor=False)
ax1.yaxis.set_ticks([np.log10(x) for p in major_ticks for x in np.linspace(10 ** p, 10 ** (p + 1), 10)], minor=True)
ax1.yaxis.set_major_formatter(StrMethodFormatter("$10^{{{x:.0f}}}$"))

ax2.set_yscale('log')
sns.boxenplot(y=data, ax=ax2)
ymin, ymax = ax1.get_ylim()
ax2.set_ylim(10**ymin, 10**ymax)

plt.tight_layout()
plt.show()

violinplot on log scale and boxenplot

JohanC
  • 71,591
  • 8
  • 33
  • 66