Plotting pandas dataframe with years

Question

       error   Months    Year
0  15.198688      Jan  2011.0
1  13.793969  Jan_Feb  2011.0
2  15.171848  Jan_Mar  2011.0
3   5.779007  Jan_Apr  2011.0
4   1.615044  Jan_May  2011.0
5   1.536096  Jan_Jun  2011.0
6   1.159742  Jan_Jul  2011.0
0   1.697396      Jan  2012.0
1   5.149847  Jan_Feb  2012.0
2   0.876639  Jan_Mar  2012.0
3   1.865001  Jan_Apr  2012.0
4   0.333077  Jan_May  2012.0
5   2.056728  Jan_Jun  2012.0
0   9.676028      Jan  2013.0
1   3.919200  Jan_Feb  2013.0
2   4.171534  Jan_Mar  2013.0
3   2.318090  Jan_Apr  2013.0
4   0.786901  Jan_May  2013.0
5   0.936041  Jan_Jun  2013.0
6   0.115029  Jan_Jul  2013.0

Is there a way to plot the pandas dataframe above so that plot has 3 lines (one for each of the 3 unique years). Y-axis has 'error' column and X-axis shows the Month. The legend should be the 3 years: 2011, 2012, 2013

For X-axis, if month is 'Jan_Feb', then label should just say 'Feb'. if month is 'Jan', then label should say 'Jan'

I tried df.plot(), but it plots everything in one plot

You can do it with `groupby`. See this [previous SO question](http://stackoverflow.com/questions/15465645/plotting-results-of-pandas-groupby) — jkr, Jan 13 '17 at 06:13

jezrael · Accepted Answer · 2017-01-13T06:37:54.827

You can first do some data cleaning - cast years to int and months to ordered categorical for correct sorting by months and then reshape by pivot, if necessary replace NaN by some value e.g. 0 by fillna:

df.Year = df.Year.astype(int)
df.Months = df.Months.str[-3:].astype('category', 
                                      ordered=True, 
                                      categories=['Jan','Feb','Mar','Apr','May','Jun','Jul'])

df = df.pivot(index='Months', columns='Year', values='error').fillna(0)
print (df)
Year         2011      2012      2013
Months                               
Jan     15.198688  1.697396  9.676028
Feb     13.793969  5.149847  3.919200
Mar     15.171848  0.876639  4.171534
Apr      5.779007  1.865001  2.318090
May      1.615044  0.333077  0.786901
Jun      1.536096  2.056728  0.936041
Jul      1.159742  0.000000  0.115029

df.plot()

Another possible solution for correct ordering is reindex by ordered months in list:

df.Year = df.Year.astype(int)
df.Months = df.Months.str[-3:]
df = df.pivot(index='Months', columns='Year', values='error')
       .fillna(0)
       .reindex(['Jan','Feb','Mar','Apr','May','Jun','Jul'])

print (df)
Year         2011      2012      2013
Months                               
Jan     15.198688  1.697396  9.676028
Feb     13.793969  5.149847  3.919200
Mar     15.171848  0.876639  4.171534
Apr      5.779007  1.865001  2.318090
May      1.615044  0.333077  0.786901
Jun      1.536096  2.056728  0.936041
Jul      1.159742  0.000000  0.115029

score 0 · Answer 2 · answered Jan 13 '17 at 06:28

Noting that in this dataset, the months can be identified from the index, ie. [0..6] -> [Jan..Jul], the desired plot should be produced with the following code:

plt = df.pivot(values='error', columns='Year').plot()

However, now the x-axis tick texts are numeric. We can fix that using:

plt.set_xticklabels(['Jan','Feb','Mar','Apr','May','Jun','Jul'])

Plotting pandas dataframe with years

2 Answers2