So I have this data:
0 25
1 10527
2 2704
3 297
4 2032
...
255845 6
255846 3
255847 2
255848 0
255849 43960
starts at index 0 and ends at 255849. I do something like this to plot it:
import praw
import csv
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("hot_post_data.csv")
#file1 = open("post_data.csv", "r", encoding="utf8")
df_c = pd.DataFrame(df)
upvotes = []
for i in range(255850):
upvotes.append(i)
print(df_c['Upvotes'].size)
plt.xlabel('number of upvotes')
plt.ylabel('posts')
plt.scatter(upvotes, df_c['Upvotes'])
plt.show()
Because of having more than 200k elements, I think it literally prints every single one of them. So I thought of a having one dot that represents 10k post or some x number like that. And those dots would be have the average value. For example lets say I have two dots, I'll get the amount of posts with upvotes between 10k - 15k and add their values and divide that by the number of posts. By doing this I'll probably able to reduce more than 200k dots to 20-30 dots?
How can I do this? Thanks