0

I have daily values of precipitation and temperature for a period of several years. I would like to compute the average of the precipitation and temperature for each month of the year (January to December). For precipitation I first need to calculate the summation of daily precipitation for each month, and then compute the average for the same month for all the years of data. For temperature I need to average the monthly averages of the values (so an average of all the data for all the months gives the exact same result). Once this is done I need to plot both sets of data (precipitation and temperature) using abbreviated months.

I cannot find a way to compute the precipitation values and to be able to obtain the sum for each month and to then average it for all years. Furthermore, I am having trouble to display the format in abbreviated months.

This is what I have tried so far (unsuccessfully):

import pandas as pd

import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter

example = [['01.10.1965 00:00', 13.88099957,    5.375],
    ['02.10.1965 00:00',    5.802999973,    3.154999971],
    ['03.10.1965 00:00',    9.605699539,    0.564999998],
    ['14.10.1965 00:00',    0.410299987,    1.11500001],
    ['31.10.1965 00:00',    6.184500217,    -0.935000002],
    ['01.11.1965 00:00',    0.347299993,    -5.235000134],
    ['02.11.1965 00:00',    0.158299997,    -8.244999886],
    ['03.11.1965 00:00',    1.626199961,    -3.980000019],
    ['24.10.1966 00:00',    0,              3.88499999],
    ['25.10.1966 00:00',    0.055100001,    1.279999971],
    ['30.10.1966 00:00',    0.25940001,     -5.554999828]]

names = ["date","Pobs","Tobs"]
data = pd.DataFrame(example, columns=names)
data['date'] = pd.to_datetime(data['date'], format='%d.%m.%Y %H:%M')

#I think the average of temperature is well computed but the precipitation would give the complete summation for all years!
tempT = data.groupby([data['date'].dt.month_name()], sort=False).mean().eval('Tobs')
tempP = data.groupby([data['date'].dt.month_name()], sort=False).sum().eval('Pobs') 

fig = plt.figure(); ax1 = fig.add_subplot(1,1,1); ax2 = ax1.twinx();
ax1.bar(tempP.index.tolist(), tempP.values, color='blue')
ax2.plot(tempT.index.tolist(), tempT.values, color='red')
ax1.set_ylabel('Precipitation [mm]', fontsize=10)
ax2.set_ylabel('Temperature [°C]', fontsize=10) 
#ax1.xaxis.set_major_formatter(DateFormatter("%b")) #this line does not work properly!
plt.show()
Rodolfo
  • 37
  • 6

1 Answers1

1

Here's working code for your problem:

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import matplotlib.dates as mdates

example = [['01.10.1965 00:00',13.88099957,5.375], ...]

names = ["date","Pobs","Tobs"]
data = pd.DataFrame(example, columns=names)
data['date'] = pd.to_datetime(data['date'], format='%d.%m.%Y %H:%M')

# Temperature:
tempT = data.groupby([data['date'].dt.month_name()], sort=False).mean().eval('Tobs')

# Precipitation:
df_sum = data.groupby([data['date'].dt.month_name(), data['date'].dt.year], sort=False).sum()  # get sum for each individual month
df_sum.index.rename(['month','year'], inplace=True)  # just renaming the index
df_sum.reset_index(level=0, inplace=True)  # make the month-index to a column
tempP = df_sum.groupby([df_sum['month']], sort=False).mean().eval('Pobs')  # get mean over all years

fig = plt.figure();
ax1 = fig.add_subplot(1,1,1);
ax2 = ax1.twinx();

xticks = pd.to_datetime(tempP.index.tolist(), format='%B').sort_values() # must work for both axes
ax1.bar(xticks, tempP.values, color='blue')
ax2.plot(xticks, tempT.values, color='red')
plt.xticks(pd.to_datetime(tempP.index.tolist(), format='%B').sort_values()) # to show all ticks

ax1.xaxis.set_major_formatter(mdates.DateFormatter("%b")) # must be called after plotting both axes

ax1.set_ylabel('Precipitation [mm]', fontsize=10)
ax2.set_ylabel('Temperature [°C]', fontsize=10)

plt.show()

Explanation: As of this StackOverflow answer, DateFormatter uses mdates. For this to work, you need to make a DatetimeIndex-Array from the month names, which the DateFormatter can then re-format.

As for the calculation, I understood the solution to your problem as such that we take the sum within each individual month and then take the average of these sums over all years. This leaves you with the average total precipitation per month over all years.

Benji
  • 549
  • 7
  • 22
  • 1
    That's exactly what I needed, thanks a lot! I didn't realize that you could add the second groupby for 'years', but that's great to know. It makes things much easier to compute from there on. – Rodolfo Jul 13 '20 at 15:57