Pandas: how to plot yearly data on top of each other

Question

I have a series of data indexed by time values (a float) and I want to take chunks of the series and plot them on top of each other. So for example, lets say I have stock prices taken about every 10 minutes for a period of 20 weeks and I want to see the weekly pattern by plotting 20 lines of the stock prices. So my X axis is one week and I have 20 lines (corresponding to the prices during the week).

Updated

The index is not a uniformly spaced value and it is a floating point. It is something like:

t = np.arange(0,12e-9,12e-9/1000.0)
noise = np.random.randn(1000)/1e12
cn = noise.cumsum()
t_noise = t+cn
y = sin(2*math.pi*36e7*t_noise) + noise
df = DataFrame(y,index=t_noise,columns=["A"])
df.plot(marker='.')
plt.axis([0,0.2e-8,0,1])

So the index is not uniformly spaced. I'm dealing with voltage vs time data from a simulator. I would like to know how to create a window of time, T, and split df into chunks of T long and plot them on top of each other. So if the data was 20*T long then I would have 20 lines in the same plot.

Sorry for the confusion; I used the stock analogy thinking it might help.

Garrett · Accepted Answer · 2012-05-05T23:55:57.617

Assuming a pandas.TimeSeries object as the starting point, you can group elements by ISO week number and ISO weekday with datetime.date.isocalendar(). The following statement, which ignores ISO year, aggregates the last sample of each day.

In [95]: daily = ts.groupby(lambda x: x.isocalendar()[1:]).agg(lambda s: s[-1])

In [96]: daily
Out[96]: 
key_0
(1, 1)     63
(1, 2)     91
(1, 3)     73
...
(20, 5)    82
(20, 6)    53
(20, 7)    63
Length: 140

There may be cleaner way to perform the next step, but the goal is to change the index from an array of tuples to a MultiIndex object.

In [97]: daily.index = pandas.MultiIndex.from_tuples(daily.index, names=['W', 'D'])

In [98]: daily
Out[98]: 
W   D
1   1    63
    2    91
    3    73
    4    88
    5    84
    6    95
    7    72
...
20  1    81
    2    53
    3    78
    4    64
    5    82
    6    53
    7    63
Length: 140

The final step is to "unstack" weekday from the MultiIndex, creating columns for each weekday, and replace the weekday numbers with an abbreviation, to improve readability.

In [102]: dofw = "Mon Tue Wed Thu Fri Sat Sun".split()

In [103]: grid = daily.unstack('D').rename(columns=lambda x: dofw[x-1])

In [104]: grid
Out[104]: 
    Mon  Tue  Wed  Thu  Fri  Sat  Sun
W                                    
1    63   91   73   88   84   95   72
2    66   77   96   72   56   80   66
...
19   56   69   89   69   96   73   80
20   81   53   78   64   82   53   63

To create a line plot for each week, transpose the dataframe, so the columns are week numbers and rows are weekdays (note this step can be avoided by unstacking week number, in place of weekday, in the previous step), and call plot.

grid.T.plot()

thanks for your post. You taught me a lot. The part that didn't work for me was plotting. I get the error "ValueError: could not convert string to float: Fri". I got a basic plot to work with plt.plot(grid.T) but the axis labels were wrong. I don't understand the .groupby command. The issue I'm having is that my index is made up up floats that are unequally spaced...I will update the question to include a dataset. — dailyglen, May 06 '12 at 05:06
I think the same concepts apply to an index of floats. You just need to write your own method to group samples into a period group and time step within the group. Hope that helps — Garrett, May 06 '12 at 14:50
@crewburn I'll try that out. One thing I can't figure out is how to index the index by value instead of by the index number. Say I want to get all rows with the index value between x and y. — dailyglen, May 06 '12 at 18:26
If the series/dataframe is indexed as the example data in the question, the index values should be float objects, not index numbers. You may want to look into the groupby method, which you pass a function that takes an object from the index, a float in this case, and returns a hash-able value to group rows by. And to group the float values into "buckets", check out ``numpy.digitize``. — Garrett, May 06 '12 at 23:42

score 0 · Answer 2 · answered May 05 '12 at 17:55

0

let me try to answer this. basically i will pad or reindex with complete weekdays and sample every 5 days while drop missing data due to holiday or suspension

>>> coke = DataReader('KO', 'yahoo', start=datetime(2012,1,1))

>>> startd=coke.index[0]-timedelta(coke.index[0].isoweekday()-1)

>>> rng = array(DateRange(str(startd), periods=90))

>>> chunk=[]

>>> for i in range(18):

... chunk.append(coke[i*5:(i+1)*5].dropna())

...

then you can loop chunk to plot each week data

answered May 05 '12 at 17:55

archlight

687
1
6
12

Thanks for the answer. I couldn't get the plot to work. How do you plot them all together. Also, my index is not equally spaced and of type float. I will update my question to include a dataset. – dailyglen May 06 '12 at 05:08

Pandas: how to plot yearly data on top of each other

2 Answers2

Linked