4

I have been following the answer here:

Pandas: how to plot yearly data on top of each other

Which takes a time series and plots the last data point for each day on a new plot. Each line on the plot represents a week's worth of data (so for example 5 data points per week):

enter image description here

I used the following code to do this:

#Chart by last price
daily = ts.groupby(lambda x: x.isocalendar()[1:]).agg(lambda s: s[-1])
daily.index = pd.MultiIndex.from_tuples(daily.index, names=['W', 'D'])
dofw = "Mon Tue Wed Thu Fri Sat Sun".split()
grid = daily.unstack('D').rename(columns=lambda x: dofw[x-1])
grid[-5:].T.plot()

What I would like to do is instead of aggregating by the last data point in a day I would like to aggregateby hour (so averaging the data for each hour) and chart the hourly data for each week. So the chart will look similar to the one in the linked image only it will have 24 data points per day per line and not just one data point per day per line

Is there any way that I can paste the Pandas DataFrame into this post? When I click copy paste it pastes as a list

EDIT:

Final code taking into account incomplete data on for the latest week for charting purposes:

# First we read the DataFrame and resample it to get a mean on every hour
df = pd.read_csv(r"MYFILE.csv", header=None,
                 parse_dates=[0], index_col=0).resample('H', how='mean').dropna()
# Then we add a week field so we can filter it by the week
df['week']= df.index.map(lambda x: x.isocalendar()[1])
start_range = list(set(df['week']))[-3]
end_range = list(set(df['week']))[-1]
# Create week labels
weekdays = 'Mon Tue Wed Thu Fri Sat Sun'.split()

# Create the figure
fig, ax = plt.subplots()

# For every week we want to plot
for week in range(start_range,end_range+1):
    # Select out the week
    dfw = df[df['week'] == week].copy()
    # Here we align all the weeks to span over the same time period so they
    # can be shown on the graph one over the other, and not one next to
    # the other.
    dfw['timestamp'] = dfw.index.values - (week * np.timedelta64(1, 'W'))
    dfw = dfw.set_index(['timestamp'])
    # Then we plot our data
    ax.plot(dfw.index, dfw[1], label='week %s' % week)
    # Now to set the x labels. First we resample the timestamp to have
    # a date frequency, and set it to be the xtick values
    if week == end_range:
        resampled = resampled.index + pd.DateOffset(weeks=1)
    else:        
        resampled = dfw.resample('D')
   # newresampled = resampled.index + pd.DateOffset(weeks=1)
    ax.set_xticks(resampled.index.values)
    # But change the xtick labels to be the weekdays.
    ax.set_xticklabels(weekdays)
# Plot the legend
plt.legend()
Community
  • 1
  • 1
Ross Middleton
  • 237
  • 2
  • 6
  • 13

2 Answers2

4

The solution is explained in the code.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# First we read the DataFrame and resample it to get a mean on every hour
df = pd.read_csv('trayport.csv', header=None,
                 parse_dates=[0], index_col=0).resample('H', how='mean').dropna()
# Then we add a week field so we can filter it by the week
df['week']= df.index.map(lambda x: x.isocalendar()[1])

# Create week labels
weekdays = 'Mon Tue Wed Thu Fri Sat Sun'.split()

# Create the figure
fig, ax = plt.subplots()

# For every week we want to plot
for week in range(1, 4):
    # Select out the week
    dfw = df[df['week'] == week].copy()
    # Here we align all the weeks to span over the same time period so they
    # can be shown on the graph one over the other, and not one next to
    # the other.
    dfw['timestamp'] = dfw.index.values - (week * np.timedelta64(1, 'W'))
    dfw = dfw.set_index(['timestamp'])
    # Then we plot our data
    ax.plot(dfw.index, dfw[1], label='week %s' % week)
    # Now to set the x labels. First we resample the timestamp to have
    # a date frequency, and set it to be the xtick values
    resampled = dfw.resample('D')
    ax.set_xticks(resampled.index.values)
    # But change the xtick labels to be the weekdays.
    ax.set_xticklabels(weekdays)
# Plot the legend
plt.legend()

The result looks like:

enter image description here

Viktor Kerkez
  • 45,070
  • 12
  • 104
  • 85
  • That code works perfectly thanks. Any more thoughts on the X-axis? How about if we make the chart larger, will that help? – Ross Middleton Aug 15 '13 at 15:53
  • And could we make a simple legend/key with the week numbers? (just 1 - 4 would be fine) – Ross Middleton Aug 15 '13 at 16:03
  • I figured it out. Updated the answer. It does everything you specified. – Viktor Kerkez Aug 15 '13 at 21:29
  • Looks good! One minor thing - if there is only say two days of data for the final week then the x labels only go from monday to tuesday. I have tried adding an extra line to set the x-axis based on the previous week (where there should be full data) but this doesn't work: dfw2 = df[df['week'] == week-1].copy() – Ross Middleton Aug 16 '13 at 08:38
  • Thanks for taking the time to look into this by the way - much appreciated! – Ross Middleton Aug 16 '13 at 08:38
  • I think I might have managed to figure it out. Sorry for the shoddy code but I added an if statement that took the previous week's dates and offset them to set the axis on the last iteration of the loop. :) – Ross Middleton Aug 16 '13 at 09:01
  • You can also create a complete week, for example from `xaxis_week = 12.08.2013.-18.08.2013`. Move all other weeks to that week (so they show up on the same plot one on top of each other) and then create the X axis using that `xaxis_week` then it doesn't meter if the week contain only 1 or all the 7 days, the full xaxis will be shown. – Viktor Kerkez Aug 16 '13 at 09:40
1

You can use the resample (DataFrame or Series) method:

df.resample('H')

by default it uses the how='mean' (ie this will average results by hour).

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535