I am using python notebook v2.7 and i am trying to do the following: I created this RDD object, with the data like in the picture
Now I want to plot the evolution in visitors over years for every referrer_category. The last step I do before I want to create the plot is stacking every column into a separate list, by using the following line of code:
x, y , z = zip(*total_real_yearly_visits_per_referrer_Category.collect())
I am using pyplot (matplotlib) as plotting package.
UPDATE:
I managed to find how this is almost done:
`import pandas as pd
df = pd.DataFrame(tab)
df.columns = {'y' , 'x' , 'z'}
fig , ax = plt.subplots()
labels = []
for key, grp in df.groupby(['y']) :
ax = grp.plot(ax = ax , kind = 'line' , x = 'x' , y = 'z' , c = key)
labels.append(key)
lines, _ = ax.get_legend_handles_labels()
ax.legend(lines, labels, loc = 'best')
plt.show()`
However, I still get no plot, but a bunch of errors:
ValueError: to_rgba: Invalid rgba arg "NA" to_rgb: Invalid rgb arg "NA" could not convert string to float: na
Anyone knows how to solve this?
If more content or data is needed, please let me know