0

I am using python notebook v2.7 and i am trying to do the following: I created this RDD object, with the data like in the picture

enter image description here

Now I want to plot the evolution in visitors over years for every referrer_category. The last step I do before I want to create the plot is stacking every column into a separate list, by using the following line of code: x, y , z = zip(*total_real_yearly_visits_per_referrer_Category.collect())

I am using pyplot (matplotlib) as plotting package.

UPDATE:

I managed to find how this is almost done:

`import pandas as pd
df = pd.DataFrame(tab)
df.columns = {'y' , 'x' , 'z'}
fig , ax = plt.subplots()
labels = []
for key, grp in df.groupby(['y']) :
    ax = grp.plot(ax = ax , kind = 'line' , x = 'x' , y = 'z' , c = key)
    labels.append(key)
lines, _ = ax.get_legend_handles_labels()
ax.legend(lines, labels, loc = 'best')
plt.show()`

However, I still get no plot, but a bunch of errors:

ValueError: to_rgba: Invalid rgba arg "NA" to_rgb: Invalid rgb arg "NA" could not convert string to float: na

Anyone knows how to solve this?

If more content or data is needed, please let me know

Olivier Thierie
  • 161
  • 2
  • 11
  • what is your question? – cel Nov 28 '15 at 16:01
  • how can i plot this data in one graph? so for each referrer_category i want a line graph (in one graph), plotting the years on x axis and the visits on the y axis – Olivier Thierie Nov 28 '15 at 16:04
  • 1
    Have you tried [this](http://matplotlib.org/users/pyplot_tutorial.html)? looks like a good place to start, your post could be improved a lot. Also, consider using [pandas](http://pandas.pydata.org/), because I see you have missing values and probably would like to do more analysis with the data. – rll Nov 28 '15 at 16:09
  • I am familiar with the page. but I dont know how i can apply this to my data and plot, as they are using only one variable with an arithmetic operation, while I have a dataset with different categories. – Olivier Thierie Nov 28 '15 at 16:12
  • I am not familiar with pandas yes, i'll take a look – Olivier Thierie Nov 28 '15 at 16:14
  • I think [this] (http://stackoverflow.com/questions/29233283/plotting-multiple-lines-with-pandas-dataframe) solves my question – Olivier Thierie Nov 28 '15 at 16:17
  • post your data as text that can be used -- images are useless. – Paul H Nov 28 '15 at 17:05

0 Answers0