1

I have a data frame with multiple time indexed entries per day. I want to sample and x number of days (eg 2 days) and the iterate forward 1 day to the end of the range of days. How can I achieve this.

For example if each day has greater than one entry:

 datetime             value
 2015-12-02 12:02:35    1
 2015-12-02 12:02:44    2
 2015-12-03 12:39:05    4
 2015-12-03 12:39:12    7
 2015-12-04 14:27:41    2
 2015-12-04 14:27:45    8
 2015-12-07 09:52:58    3
 2015-12-07 13:52:15    5
 2015-12-07 13:52:21    9

I would like to iterate through taking two day samples at a time eg

 2015-12-02 12:02:35    1
 2015-12-02 12:02:44    2
 2015-12-03 12:39:05    4
 2015-12-03 12:39:12    7

then

 2015-12-03 12:39:05    4
 2015-12-03 12:39:12    7
 2015-12-04 14:27:41    2
 2015-12-04 14:27:45    8

ending with

 2015-12-04 14:27:41    2
 2015-12-04 14:27:45    8
 2015-12-07 09:52:58    3
 2015-12-07 13:52:15    5
 2015-12-07 13:52:21    9

Any help would be appreciated!

azuric
  • 2,679
  • 7
  • 29
  • 44

1 Answers1

1

You can use:

#https://stackoverflow.com/a/6822773/2901002
from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + (elem,)
        yield result


dfs = [df[df['datetime'].dt.day.isin(x)] for x in window(df['datetime'].dt.day.unique())]
print (dfs[0])
             datetime  value
0 2015-12-02 12:02:35      1
1 2015-12-02 12:02:44      2
2 2015-12-03 12:39:05      4
3 2015-12-03 12:39:12      7

print (dfs[1])
             datetime  value
2 2015-12-03 12:39:05      4
3 2015-12-03 12:39:12      7
4 2015-12-04 14:27:41      2
5 2015-12-04 14:27:45      8
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks I get what you are doing here. Thanks. Can I assume you could use the deque implementation aswell? – azuric Aug 07 '17 at 07:49
  • Yes, it is another solution. In pandas is used `rolling`, but it aggregate data, so it is a bit problematic used here. – jezrael Aug 07 '17 at 07:50
  • Jezrael I am geting issues with a dataset which spans several years. I get all the days which are adjacent eg 02, 03 for every month and year – azuric Aug 07 '17 at 09:35
  • is there a way to this with date instead of day? – azuric Aug 07 '17 at 09:48
  • You need change `day` to `date` only `dfs = [df[df['datetime'].dt.date.isin(x)] for x in window(df['datetime'].dt.date.unique())]` – jezrael Aug 07 '17 at 10:15