3

I am working with stock price data and would like to get resample() to return every 2nd row rather than every 2nd business day (resample('2B')). The obstacle is any holiday that lands on a weekday. See below, MLK Day is Monday, Jan 15, 2018:

import pandas as pd

data = '''\
date,price
2018-01-08,88.28
2018-01-09,88.22
2018-01-10,87.82
2018-01-11,88.08
2018-01-12,89.6
2018-01-16,88.35
2018-01-17,90.14
2018-01-18,90.1
2018-01-19,90.0
2018-01-22,91.61
2018-01-23,91.9
2018-01-24,91.82
2018-01-25,92.33
2018-01-26,94.06'''

fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, parse_dates=['date'], index_col=[0])

df_resample = df.resample('2B').min()
print(df_resample)

Output:

            price
2018-01-08  88.22
2018-01-10  87.82
2018-01-12  89.60
2018-01-16  88.35
2018-01-18  90.00
2018-01-22  91.61
2018-01-24  91.82
2018-01-26  94.06

I'd like the resample to jump from 1/12 to 1/17. I know that I can use df['price'].loc[::2] to deliver df.resample('2B').last() but I need to use min(), max() and sum() as well.

Thanks.

Expected Output:

enter image description here

MJS
  • 1,573
  • 3
  • 17
  • 26
  • `df.loc[::2]` was my first thought, can you explain why that wont work for this dataset please? `min`, `max`, etc. should all work fine if you slice the frame – Haleemur Ali Jun 27 '18 at 13:41
  • My thought is to redefine the Business days somehow. Maybe something like this: https://stackoverflow.com/questions/33094297/create-trading-holiday-calendar-with-pandas – Anton vBR Jun 27 '18 at 13:42
  • Haleemur - How would you return the 2 day minimum (every 2nd row) using `.loc[::2]` ? – MJS Jun 27 '18 at 13:53
  • @MJS, thanks for updating the question with sample data, it is clearer now what you are trying to achieve. as an aside, when responding to comments you can mention the user you are addressing using `@username`. otherwise, the user is not notified of comment activity – Haleemur Ali Jun 27 '18 at 15:56

3 Answers3

2

For a stable solution I'd have a look at redifining the B-days somehow.

But if you reset index you could use the index numbers and groupby:

df = df.reset_index()
df_resample = df.groupby(df.index // 2).min()
print(df_resample)

Returns:

        date  price
0 2018-01-08  88.22
1 2018-01-10  87.82
2 2018-01-12  88.35
3 2018-01-17  90.10
4 2018-01-19  90.00
5 2018-01-23  91.82
6 2018-01-25  92.33

Or you could do something like this:

g = np.arange(len(df))// 2
df_resample = df.groupby(g).agg(['last','min','max','sum'])
df_resample.insert(0, 'Date', df.index[1::2])

print(df_resample)

Returns:

        Date  price                      
               last    min    max     sum
0 2018-01-09  88.22  88.22  88.28  176.50
1 2018-01-11  88.08  87.82  88.08  175.90
2 2018-01-16  88.35  88.35  89.60  177.95
3 2018-01-18  90.10  90.10  90.14  180.24
4 2018-01-22  91.61  90.00  91.61  181.61
5 2018-01-24  91.82  91.82  91.90  183.72
6 2018-01-26  94.06  92.33  94.06  186.39
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
  • interesting Anton - in your first solution, I see that `.min()` begins on 1/8 whereas `.last()` and `.max()` begin on 1/9. In a perfect world, they would all begin and end on the same days. – MJS Jun 27 '18 at 14:31
  • Max returns the max for date too. Have a look at my alternative. – Anton vBR Jun 27 '18 at 14:33
  • thanks. 2nd solution - I am trying to figure out how to key off the 2nd row and look backwards, rather than looking fwd (on 1/8 we're identifying the min from today and the next day, 1/8-1/9). – MJS Jun 27 '18 at 14:45
  • I pasted in an Excel snapshot. Thank you so much for working with me on this. – MJS Jun 27 '18 at 14:55
  • @MJS in that case just take the 2nd element with `[1::2]`, right? See my update. – Anton vBR Jun 27 '18 at 15:15
  • ugh, yes - I was fumbling around with shift(). Thank you again, I will accept your answer. – MJS Jun 27 '18 at 15:19
0

using np.repeat with array slicing, you can create a resampled dataframe where the date (index) is repeated twice.

df_resample = df.set_index(np.repeat(df.index[::2],2)[:len(df)])
# outputs:
            price
date
2018-01-08  88.28
2018-01-08  88.22
2018-01-10  87.82
2018-01-10  88.08
2018-01-12  89.60
2018-01-12  88.35
2018-01-17  90.14
2018-01-17  90.10
2018-01-19  90.00
2018-01-19  91.61
2018-01-23  91.90
2018-01-23  91.82
2018-01-25  92.33
2018-01-25  94.06

Then grouping as usual yields the desired output:

df_resampled.groupby(level=0).agg(['last', 'min', 'max', 'sum'])

            price
             last    min    max     sum
date
2018-01-08  88.22  88.22  88.28  176.50
2018-01-10  88.08  87.82  88.08  175.90
2018-01-12  88.35  88.35  89.60  177.95
2018-01-17  90.10  90.10  90.14  180.24
2018-01-19  91.61  90.00  91.61  181.61
2018-01-23  91.82  91.82  91.90  183.72
2018-01-25  94.06  92.33  94.06  186.39
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85
0

I think this may also work, kind of reversing the steps...

df['price'].rolling(window=2).max().iloc[1::2]
MJS
  • 1,573
  • 3
  • 17
  • 26