1

I have a big dataset of timed solar panel yield logs per ten minutes throughout multiple years. I would like to get a bar chart which displays the monthly summation of these logs per year.

I don't know if this makes sense, so let me explain with an example.

Through resampling, I got the following dataframe:

Last updated
2017-01-31 00:00:00+01:00     24.1
2017-02-28 00:00:00+01:00     44.8
2017-03-31 00:00:00+02:00    140.1
2017-04-30 00:00:00+02:00    168.7
2017-05-31 00:00:00+02:00    194.4
2017-06-30 00:00:00+02:00    214.7
2017-07-31 00:00:00+02:00    204.6
2017-08-31 00:00:00+02:00    159.5
2017-09-30 00:00:00+02:00    117.3
2017-10-31 00:00:00+01:00     74.8
2017-11-30 00:00:00+01:00     30.4
2017-12-31 00:00:00+01:00      7.2
2018-01-31 00:00:00+01:00      9.4
Freq: M, Name: Yield (kWh), dtype: float64

I think the best way of achieving my desired plot is convert this somehow into a dataframe with

  • an index containing only months (is this possible with DateTimeIndex, or shall I simply use a RangeIndex from 1 to 12?)
  • a column for each year (2016 and 2017 in this example)
  • the values reorganised accordingly

So, for example:

Last updated   2017  2018
Jan            24.1   9.4
Feb            44.8     0
Mar           140.1     0
Apr           168.7     0
May           194.4     0
Jun           214.7     0
Jul           204.6     0
Aug           159.5     0
Sep           117.3     0
Oct            74.8     0
Nov            30.4     0
Dec             7.2     0

I can easily achieve this with a some naive looping, but I'm convinced that pandas must hiding some clever trick here from me. Can somebody point me to an efficient solution?

Thanks a lot!

Bart Van Loon
  • 1,430
  • 8
  • 18
  • This is a question for reshape https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe/47152692#47152692 – BENY Jan 22 '18 at 15:15

1 Answers1

3

You can use pivot:

df = pd.pivot(index=df.index.strftime('%b'), 
              columns=df.index.year, 
              values=df['Last updated']).fillna(0).reindex(cats)
print (df)
      2017  2018
Jan   24.1   9.4
Feb   44.8   0.0
Mar  140.1   0.0
Apr  168.7   0.0
May  194.4   0.0
Jun  214.7   0.0
Jul  204.6   0.0
Aug  159.5   0.0
Sep  117.3   0.0
Oct   74.8   0.0
Nov   30.4   0.0
Dec    7.2   0.0

Another more general solution if necessary aggregate values in Last updated with groupby by months generated by strftime and year, then aggregate sum (or mean if necessary), reshape by unstack and last reindex:

df = (df.groupby([df.index.strftime('%b'), df.index.year])['Last updated']
        .sum().unstack(fill_value=0)
        .reindex(cats))
print (df)
      2017  2018
Jan   24.1   9.4
Feb   44.8   0.0
Mar  140.1   0.0
Apr  168.7   0.0
May  194.4   0.0
Jun  214.7   0.0
Jul  204.6   0.0
Aug  159.5   0.0
Sep  117.3   0.0
Oct   74.8   0.0
Nov   30.4   0.0
Dec    7.2   0.0

Solution with ordered categoricals for correct ordering:

cats = ['Jan', 'Feb', 'Mar', 'Apr','May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
cat_type = pd.api.types.CategoricalDtype(categories=cats, ordered=True)
c = pd.Categorical(df.index.strftime('%b'), dtype=cat_type)

df = (df.groupby([c, df.index.year])['Last updated']
        .sum().unstack(fill_value=0))
print (df)
      2017  2018
Jan   24.1   9.4
Feb   44.8   0.0
Mar  140.1   0.0
Apr  168.7   0.0
May  194.4   0.0
Jun  214.7   0.0
Jul  204.6   0.0
Aug  159.5   0.0
Sep  117.3   0.0
Oct   74.8   0.0
Nov   30.4   0.0
Dec    7.2   0.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • waw, this is brilliant, thank you so much. I went for the `groupby` option and used `calendar.month_abbr[1:]` for the reindexing. – Bart Van Loon Jan 22 '18 at 15:38