1

I have a big dictionary containing frequencies, like this:

frequency = {3: 231, 6: 373, 8: 455}

where the dictionary keys represent the lengths of the sentences and the values the number of sentences with that length.

I created the bar plot like this:

fig, ax = plt.subplots()
ax.bar(list(frequency.keys()), frequency.values(), log=True, color='g', width=0.5)
ax.set_title('DISTRIBUTION OF SENTENCE LENGTH')
ax.set_xlabel('Sentence length')
ax.set_ylabel('Frequency')
plt.show()

the result is correct and is the following:

enter image description here

now what I would like to do is to draw the distribution of these values. Something like that:

enter image description here

How can I do? I have already tried to follow this post (and others like this), but with poor results. Thank you!

Elidor00
  • 1,271
  • 13
  • 27
  • Just to clarify - Do you want to draw the distribution AND the bar chart, or do you want to replace the bar chart with the distribution? – M-Chen-3 Feb 20 '21 at 00:14
  • 1
    Both, the bar chart AND distribution. The bar chart I managed to do, I just need to add the distribution (the one I tried to draw to understand) to the bar chart. – Elidor00 Feb 20 '21 at 09:41

1 Answers1

2

In seaborn's histplot there is a weights parameter. It also allows to add a kde. The default bandwidth seems a bit too wide, it can be adjusted via kde_kws={'bw_adjust': 0.3}. With discrete=True, the histogram bins are adapted to the discrete values.

Here is an example:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

frequencies = {1: 2000}
for i in range(2, 10):
    frequencies[i] = int(frequencies[i - 1] * np.random.uniform(1.02, 1.1))
for i in range(10, 500):
    frequencies[i] = int(frequencies[i - 1] * np.random.uniform(0.97, 0.99))
    if frequencies[i] == 0:
        break

ax = sns.histplot(x=frequencies.keys(), weights=frequencies.values(), discrete=True,
                  kde=True, kde_kws={'bw_adjust': 0.2}, line_kws={'linewidth': 3})
ax.margins(x=0.01)
plt.show()

example plot

JohanC
  • 71,591
  • 8
  • 33
  • 66