1

I'm trying to create the chart in this question, using this answer. I'm open to any solution that works.

Visual borrowed from original question: enter image description here

Difference from that question is I've already calculated my bins and frequency values so I don't use numpy or matplotlib to do so.

Here's my sample data, I refer to it as df_fd in my sample code below:

     low_bin   high_bin  frequency
0  13.142857  18.857143          3
1  18.857143  24.571429          5
2  24.571429  30.285714          8
3  30.285714  36.000000          8
4  36.000000  41.714286          7
5  41.714286  47.428571          7
6  47.428571  53.142857          1
7  53.142857  58.857143          1

Based off the cited question here's my code (df_fd is the DataFrame above):

fig, ax = plt.subplots()
ax.bar(df_fd.low_bin, df_fd.frequency, width= df_fd.high_bin-df_fd.low_bin)
X,Y = np.meshgrid(bins, df_fd['frequency'])
Y = Y.astype(np.float)
Y[Y>df_fd['frequency']] = np.nan
plt.scatter(X,Y)

This Y[Y>df_fd['frequency']] = np.nan statement is what fails and I don't know how to get around it. I understand what it's trying to do and the best guess I have is somehow mapping the matrix index to the DataFrame index would help, but I'm not sure how to do that.

Thank you for helping me!

Programming_Learner_DK
  • 1,509
  • 4
  • 23
  • 49

1 Answers1

2

One hacky solution using a scatter plot:

(df.assign(bin=np.mean([df['low_bin'], df['high_bin']], axis=0))
   .loc[lambda d: d.index.repeat(tmp['frequency'])]
   .assign(Y=lambda d: d.groupby(level=0).cumcount())
   .plot.scatter(x='bin', y='Y', s=600)
)

It works by getting the average of low/high as X value, then repeating the rows as many times as the "frequency" value, and incrementing the count with a groupby.cumcount.

Output:

enter image description here

mozway
  • 194,879
  • 13
  • 39
  • 75