Remove Holidays and Weekends in a very long time-serie, how to model time-series in Python?

Question

Is there some function in Python to handle this. GoogleDocs has a Weekday -operation so perhaps there is something like that in Python. I am pretty sure someone must have solved this, similar problems occur in sparse data such as in finance and research. I am basically just trying to organize a huge amount of different sized vectors indexed by days, time-series, I am not sure how I should hadle the days -- mark the first day with 1 and the last day with N or with unix -time or how should that be done? I am not sure whether the time-series should be saved into matrix so I could model them more easily to calculate correlation matrices and such things, any ready thing to do such things?

Let's try to solve this problem without the "practical" extra clutter:

import itertools
seq = range(100000)
criteria  = cycle([True]*10 + [False]*801)
list(compress(seq, criteria))

now have to change them into days and then change the $\mathbb R$ into $( \mathbb R, \mathbb R)$, tuple. So $V : \mathbb R \mapsto \mathbb R^{2}$ missing, investigating.

[Update]

Let's play! Below code solves the subproblem -- creates some test data to test things -- now we need to create arbitrary days and valuations there to try to test it on arbitrary timeseries. If we can create some function $V$, we are very close to solve this problem...it must consider though the holidays and weekends so maybe not easy (not sure).

import itertools as i
import time
import math
import numpy



def createRandomData():
    samples=[]

    for x in range(5):
        seq = range(5)
        criteria  = i.cycle([True]*x+ [False]*3)

        samples += [list(i.compress( seq, criteria ))] 

    return samples

def createNNtriangularMatrix(data):
    N = len(data)
    return [aa+[0]*(N-len(aa)) for aa in data]


A= createNNtriangularMatrix(createRandomData())
print numpy.array(A)
print numpy.corrcoef(A)

Does this help? `today = time.strftime('%A') # as string` or `today = time.strftime('%w') # as integer 0 to 6` — Steven Rumbalski, Oct 04 '11 at 20:29
`print time.strftime('%A',time.strptime('11/29/1972', '%m/%d/%Y'))` tells me I was born on a Wednesday. — Steven Rumbalski, Oct 04 '11 at 20:35
I think you should create a matrix where each weekday would be defined by its column, identical to what a paper calendar looks like. So, when you want to select mondays, you would use slicing: `mondays = manydays[:,1]`, which is like saying "all elements from second column". In this case, `manydays` is a `numpy.ndarray`. — heltonbiker, Oct 04 '11 at 20:38
As for holidays, maybe you could unpack the javascript that calculates them for this site: http://www.timeanddate.com/calendar/?year=2010&country=1 — Steven Rumbalski, Oct 04 '11 at 20:44
@StevenRumbalski: but how can you create random days with that? Better if it could correspond somehow to the code above, now trying to solve the function $V$. — hhh, Oct 04 '11 at 21:16

score 1 · Answer 1 · answered Apr 26 '12 at 19:41

1

Try using pandas. You can create a DateOffset for business days and include your data in a DataFrame (see: http://pandas.pydata.org/pandas-docs/stable/timeseries.html) to analyze it.

answered Apr 26 '12 at 19:41

bmu

35,119
13
91
108

score 1 · Answer 2 · edited May 23 '17 at 12:13

1

I think you should figure out someway the days you want to INCLUDE, and create a (probably looping) subroutine use slicing operations on your big list.

For discontinuous slices, you can take a look at this question:

Discontinuous slice in python list

Or perhaps you could make the days you do not want receive a null value (zero or None).

edited May 23 '17 at 12:13

Community

1
1

answered Oct 04 '11 at 20:08

heltonbiker

26,657
28
137
252

Please tell us about the structure of the data you have, perhaps some small part of it. Is it a database? Which format? Is it text? (if so, paste some lines). It would be fine to post some example operation you want to perform and the type of output you intend to get. – heltonbiker Oct 04 '11 at 20:18
...yes but I need to change the days first to numbers so I can use modulo, how can do thta? If 0 is monday then, to handle weekends, I could use just `saturdays= [(x%5) for x in daysNumbers]; sundays = [(x%6) for x in daysNumbers]; if day in sundays or day in saturndays; do not add value or add NA` so if condition is zero, add NA value or skip it. (no err here, thinking) Then the next question is to handle the special days...like holidays. – hhh Oct 04 '11 at 20:18
@hhh don't use shortened links. Full link: http://www.seligson.fi/graafit/omxh25.csv – agf Oct 04 '11 at 20:24
I guess you have two tasks: get a list of holidays from somewhere (from Google Calendar API?), AND to find the structure needed for your calculations. For example, if you want to apply correlation matrices, then use some `numpy` or `scipy` object, probably a `ndarray`. – heltonbiker Oct 04 '11 at 20:25
[here](http://www.seligson.fi/graafit/phoenix.csv), [here](https://personal.vanguard.com/us/funds/tools/pricehistorysearch?radio=1&results=get&FundType=ExchangeTradedShares&FundIntExt=INT&FundId=0954&fundName=0954&radiobutton2=1&beginDate=01%2F01%2F2009&endDate=01%2F01%2F2010&year=#res), [here](http://www.seligson.fi/graafit/global-pharma.csv), ... – hhh Oct 04 '11 at 20:26

score 0 · Answer 3 · answered Feb 21 '13 at 20:29

Why would you want to remove the holidays and weekends? Is it because they are outliers or zeroes? If they are zeroes they will be handled by the model. You would want to leave the data in the time series and use dummy variables to model the seasonal effects (ie monthly dummies), day of the week dummies and holiday dummies. Clearly, I am dummfounded. I have season people who are unable to deal with time series analysis even break the weekdays into one time series and the weekends into another which completely ignores the lead and lag impacts around holidays.

score 0 · Answer 4 · answered Nov 10 '15 at 01:28

0

If it is trading days you want then you can use the pandas datareader package to download the s&p 500 historical prices for U.S. and use the index of dates as a mask to your data.

Answered on mobile, I'll add links and code later.

answered Nov 10 '15 at 01:28

Erik

412
1
5
11

score 0 · Answer 5 · answered Oct 04 '11 at 20:37

I think it depends on the scope of your problem, for a personal calendar, 'day' is good enough for indexing.

One's life is as long as 200 years, about 73000 days, simply calculate and record them all, maybe use a dict, e.g.

day = {}
# day[0] = [event_a, event_b, ...]
# or you may want to rewrite the __getitem__ method like this: day['09-05-2012']

Remove Holidays and Weekends in a very long time-serie, how to model time-series in Python?

5 Answers5

Linked