2

I have a pandas dataframe with 5 years daily time series data. I want to make a monthly plot from whole datasets so that the plot should shows variation (std or something else) within monthly data. Simillar figure I tried to create but did not found a way to do that:

enter image description here

for example, I have a sudo daily precipitation data:

date = pd.to_datetime("1st of Dec, 1999")
dates = date+pd.to_timedelta(np.arange(1900), 'D')
ppt = np.random.normal(loc=0.0, scale=1.0, size=1900).cumsum()
df = pd.DataFrame({'pre':ppt},index=dates)

Manually I can do it like:

one   = df['pre']['1999-12-01':'2000-11-29'].values
two   = df['pre']['2000-12-01':'2001-11-30'].values
three = df['pre']['2001-12-01':'2002-11-30'].values
four  = df['pre']['2002-12-01':'2003-11-30'].values
five  = df['pre']['2003-12-01':'2004-11-29'].values
df = pd.DataFrame({'2000':one,'2001':two,'2002':three,'2003':four,'2004':five})
std = df.std(axis=1)
lw = df.mean(axis=1)-std
up = df.mean(axis=1)+std

plt.fill_between(np.arange(365), up, lw, alpha=.4)

I am looking for the more pythonic way to do that instead of doing it manually!

Any helps will be highly appreciated

bikuser
  • 2,013
  • 4
  • 33
  • 57

2 Answers2

2

If I'm understanding you correctly you'd like to plot your daily observations against a monthly periodic mean +/- 1 standard deviation. And that's what you get in my screenshot below. Nevermind the lackluster design and color choice. We'll get to that if this is something you can use. And please notice that I've replaced your ppt = np.random.rand(1900) with ppt = np.random.normal(loc=0.0, scale=1.0, size=1900).cumsum() just to make the data look a bit more like your screenshot.

enter image description here

Here I've aggregated the daily data by month, and retrieved mean and standard deviation for each month. Then I've merged that data with the original dataframe so that you're able to plot both the source and the grouped data like this:

# imports
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mdates
import numpy as np

# Data that matches your setup, but with a random
# seed to make it reproducible
np.random.seed(42)
date = pd.to_datetime("1st of Dec, 1999")
dates = date+pd.to_timedelta(np.arange(1900), 'D')
#ppt = np.random.rand(1900)
ppt = np.random.normal(loc=0.0, scale=1.0, size=1900).cumsum()

df = pd.DataFrame({'ppt':ppt},index=dates)

# A subset
df = df.tail(200)

# Add a yearmonth column
df['YearMonth'] = df.index.map(lambda x: 100*x.year + x.month)

# Create aggregated dataframe
df2 = df.groupby('YearMonth').agg(['mean', 'std']).reset_index()
df2.columns = ['YearMonth', 'mean', 'std']

# Merge original data and aggregated data
df3 = pd.merge(df,df2,how='left',on=['YearMonth'])
df3 = df3.set_index(df.index)
df3 = df3[['ppt', 'mean', 'std']]

# Function to make your plot
def monthplot():
    fig, ax = plt.subplots(1)
    ax.set_facecolor('white')

    # Define upper and lower bounds for shaded variation
    lower_bound = df3['mean'] + df3['std']*-1
    upper_bound = df3['mean'] + df3['std']

    fig, ax = plt.subplots(1)
    ax.set_facecolor('white')

    # Source data and mean
    ax.plot(df3.index,df3['mean'], lw=0.5, color = 'red')
    ax.plot(df3.index, df3['ppt'], lw=0.1, color = 'blue')

    # Variation and shaded area
    ax.fill_between(df3.index, lower_bound, upper_bound, facecolor='grey', alpha=0.5)

    fig = ax.get_figure()

    # Assign months to X axis
    locator = mdates.MonthLocator()  # every month
    # Specify the format - %b gives us Jan, Feb...
    fmt = mdates.DateFormatter('%b')

    X = plt.gca().xaxis
    X.set_major_locator(locator)
    X.set_major_formatter(fmt)

    fig.show()

monthplot()

Check out this post for more on axis formatting and this post on how to add a YearMonth column.

vestland
  • 55,229
  • 37
  • 187
  • 305
  • Yes exactly this screenshot was just for example....sorry that it was misleading..but good things is that you understood correctly :). That the answer what I was looking for. Thank you for your help and piece of code which is really helpful. – bikuser Aug 24 '18 at 08:57
  • @bikuser, thanks for your feedback! I think you can use the previous screenshot either way. If you put it back and emphasize that it's just a design demonstration and that you dont' want rolling, but periodioc estimates, I can edit my answer as well. The screenshot showing in your question right may be more correct with regards to your data, while the previous actually did sescribe the desired result pretty well. Sorry for complaining =) – vestland Aug 24 '18 at 09:50
0

In your example, you have a few mistakes, but I think it isn't important. Do you want all years to be on the same graphic (like in your example)? If you do, this may help you:

df['month'] = df.index.strftime("%m-%d")
df['year'] = df.index.year
df.set_index(['month']).drop(['year'],1).plot()
Pang
  • 9,564
  • 146
  • 81
  • 122