10

I have read multiple post similar to my question, but I still can't figure it out. I have a pandas df that looks like the following (for multiple days):

Out[1]: 
                     price  quantity
time                                
2016-06-08 09:00:22  32.30    1960.0
2016-06-08 09:00:22  32.30     142.0
2016-06-08 09:00:22  32.30    3857.0
2016-06-08 09:00:22  32.30    1000.0
2016-06-08 09:00:22  32.35     991.0
2016-06-08 09:00:22  32.30     447.0
...

To calculate the vwap I could do:

df['vwap'] = (np.cumsum(df.quantity * df.price) / np.cumsum(df.quantity))

However, I would like to start over every day (groupby), but I can't figure out how to make it work with a (lambda?) function.

df['vwap_day'] = df.groupby(df.index.date)['vwap'].apply(lambda ...

Speed is of essence. Would appreciate any help:)

cJc
  • 813
  • 1
  • 11
  • 33
  • 1
    This is not the correct formula for VWAP. you need to calculate the typical price (Average if Hi, Lo, Close). Your price data should have this. – DISC-O May 07 '21 at 01:25
  • @DISC-O So please contribute and add your answer with code... – cJc May 07 '21 at 07:05
  • 1
    Not sure I understand, the comment was about the calculation formula and the need for more data (the high and low of the period, the average then goes into your calc); the code is trivial, I assume you know how to calculate the average of 3 columns. – DISC-O May 08 '21 at 12:17

2 Answers2

27

Option 0
plain vanilla approach

def vwap(df):
    q = df.quantity.values
    p = df.price.values
    return df.assign(vwap=(p * q).cumsum() / q.cumsum())

df = df.groupby(df.index.date, group_keys=False).apply(vwap)
df

                     price  quantity       vwap
time                                           
2016-06-08 09:00:22  32.30    1960.0  32.300000
2016-06-08 09:00:22  32.30     142.0  32.300000
2016-06-08 09:00:22  32.30    3857.0  32.300000
2016-06-08 09:00:22  32.30    1000.0  32.300000
2016-06-08 09:00:22  32.35     991.0  32.306233
2016-06-08 09:00:22  32.30     447.0  32.305901

Option 1
Throwing in a little eval

df = df.assign(
    vwap=df.eval(
        'wgtd = price * quantity', inplace=False
    ).groupby(df.index.date).cumsum().eval('wgtd / quantity')
)
df

                     price  quantity       vwap
time                                           
2016-06-08 09:00:22  32.30    1960.0  32.300000
2016-06-08 09:00:22  32.30     142.0  32.300000
2016-06-08 09:00:22  32.30    3857.0  32.300000
2016-06-08 09:00:22  32.30    1000.0  32.300000
2016-06-08 09:00:22  32.35     991.0  32.306233
2016-06-08 09:00:22  32.30     447.0  32.305901
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 1
    Fancy play syndrome – Ted Petrou Jun 30 '17 at 20:59
  • @TedPetrou guilty – piRSquared Jun 30 '17 at 21:02
  • @piRSquared When I run Option 0 as above, it works fine. However I need a new column in my df, so I did df['vwap'] = df.groupby(df.index.date, group_keys=False).apply(vwap) That throws me the following error: ValueError: Wrong number of items passed 7, placement implies 1, why is that? – cJc Jul 01 '17 at 08:07
  • What I'm returning is a new dataframe with the column included. You want to reassign the whole thing to df. Not df['vwap'] – piRSquared Jul 01 '17 at 10:52
  • A-ha! Working like a charm now, tnx vm. I'll stick to option 0, as even after reading up on it it, I'm not at all clear on what eval realy does.. – cJc Jul 01 '17 at 12:51
  • Works as desired, wanted to ask whether we can further find vwap for week or month – Animesh Shah Jul 16 '21 at 11:16
  • Hello, do you know why this doesn't work when 1) using the command "resample"? and could you please help me how to use this exact command and group the data per MINUTE with the time column of the form 2016-06-08 09:00:22? So group the data by minute and then apply the vwap? I'm stuck here – Baobab Jun 10 '22 at 13:44
8

I also used this method before but it's not working quite accurately if you're trying to limit the window period. Instead I found the TA python library to work really well: https://technical-analysis-library-in-python.readthedocs.io/en/latest/index.html

from ta.volume import VolumeWeightedAveragePrice

# ...
def vwap(dataframe, label='vwap', window=3, fillna=True):
        dataframe[label] = VolumeWeightedAveragePrice(high=dataframe['high'], low=dataframe['low'], close=dataframe["close"], volume=dataframe['volume'], window=window, fillna=fillna).volume_weighted_average_price()
        return dataframe
ibacalu
  • 81
  • 1
  • 3