I am generating a histogram in matplotlib, but what I am seeing does not line up with the data.
This is my code:
labels = [0, 1000, 2000, 3000, 4000, 5000]
fig, ax = plt.subplots()
ax.hist(recent_grads['Sample_size'], bins=25, range=(0, 5000), ec='black', align='left')
ax.set_xticklabels(labels=labels, rotation=45)
plt.show()
print(recent_grads['Sample_size'].max())
vc = recent_grads['Sample_size'].value_counts()
print(dict(vc))
I get this histogram:
This is a dict of the value counts for the 'Sample_size' column:
{22: 3, 4: 3, 7: 3, 118: 2, 5: 2, 156: 2, 158: 2, 55: 2, 425: 2, 37: 2,
36: 2, 184: 2, 31: 2, 30: 2, 26: 2, 24: 2, 142: 2, 10: 2, 95: 2, 3: 2,
97: 2, 342: 2, 73: 1, 51: 1, 565: 1, 310: 1, 88: 1, 56: 1, 569: 1, 59: 1,
2394: 1, 278: 1, 71: 1, 62: 1, 63: 1, 338: 1, 86: 1, 81: 1, 79: 1, 67: 1,
49: 1, 78: 1, 204: 1, 2380: 1, 843: 1, 255: 1, 1322: 1, 48: 1, 47: 1, 2: 1,
1029: 1, 260: 1, 29: 1, 9: 1, 11: 1, 13: 1, 14: 1, 331: 1, 16: 1, 17: 1, 18: 1,
374: 1, 25: 1, 28: 1, 541: 1, 32: 1, 289: 1, 1058: 1, 60: 1, 38: 1, 295: 1,
2042: 1, 90: 1, 43: 1, 44: 1, 45: 1, 89: 1, 99: 1, 92: 1, 1728: 1, 202: 1, 199: 1,
681: 1, 1014: 1, 117: 1, 21: 1, 190: 1, 427: 1, 273: 1, 183: 1, 182: 1, 180: 1,
179: 1, 174: 1, 546: 1, 208: 1, 1370: 1, 214: 1, 219: 1, 224: 1, 225: 1, 53: 1,
518: 1, 235: 1, 151: 1, 1436: 1, 244: 1, 246: 1, 247: 1, 249: 1, 2554: 1, 1196: 1,
375: 1, 1629: 1, 110: 1, 123: 1, 631: 1, 259: 1, 4212: 1, 113: 1, 623: 1, 1387: 1,
46: 1, 362: 1, 264: 1, 103: 1, 357: 1, 39: 1, 419: 1, 2684: 1, 125: 1, 126: 1,
128: 1, 130: 1, 132: 1, 353: 1, 590: 1, 8: 1, 2189: 1, 399: 1, 147: 1, 919: 1,
152: 1, 2584: 1, 157: 1, 1186: 1, 1024: 1}
1) Why does my first bin start at 1000 when a vast majority of the values are in the range 0 - 1000?
I would have expected the first column to be against the y-axis.
2) The max value in the 'Sample_size' column is 4212. Why am I seeing a bin above the 5000 range?