3

I am doing a stock trade data analysis using daily tick data. let's say one column is closed_price for closed price daily, and tick_price for tick_price at 2:30 pm. The idea is customizing a rolling window to use closed_price between previous days but uses tick_price for the current day. In the sliding window,window[-n:-1] will from closed_price and last elementwindow[0] will from tick_price. The rolling can do a great job when dealing with a single column. But I can't find ways to combine two columns data into one rolling object. How should I do this within an acceptable time limit?

I have tried df.rolling(n).apply(func), but inside the function, I couldn't get the columns info. seems the rolling is iterating one series over another, not row by row. I'm reading source code about rolling and np.stride but felt overwhelmed. The last way would be using for loops but I feel it'll be much slower.

the data can be copied in jupyter and it is like:
pd.DataFrame(np.array([[3535.229 , 3547.2157],
       [3564.038 , 3554.8975],
       [3541.727 , 3549.8678],
       [3471.456 , 3453.7913],
       [3480.13  , 3480.0087]]),columns=['closed_price','tick_price'])

the typical rolling(window) can do a good job on single columns. but what I what to do is :

   my_rolling(3) return:
      3535.229(close)
      3564.038(close)
      3453.7913(tick)

e.g., I'm implementing my rolling_sum() like below:

def rolling_sum(df_w,window,output_column='rolling_3_sum'):
    df=df_w.copy()
    w=window
    df[output_column]=0
    index_output_column = df.columns.get_loc(output_column)
    for i in range(w-1,df.shape[0]):
        window=df.iloc[i-w+1:i+1]['closed_price'].values # get closed price window
        window[-1]=df.iloc[i]['tick_price'] # replace the latest value with tick price
        df.iat[i, index_output_column] =np.sum(window) # sum the values in window
    return df[output_column]

it works right now, but in this way I have to rewrite almost every function like rolling().sum, rolling().std(). and It is a bit slow. What I want to implement a rolling_func(), which return a modified rolling object. If it can satisfies my needs above so I can invoke it like:

 rolling_func(n).sum() 
 rolling_func(n).std() 

without rewriting a lot.

Edit: after reading a similar solution. I tried the func below:

def roll(df, w, **kwargs):
# np.dstack([df.values[i:i+w, :] for i in range(len(df.index) - w + 1)]).T
#roll_array=np.dstack([np.array(np.append(df.values[i:i+w-1,0],df.values[i+w-1:i+w,1])) for i in range(len(df.index) - w + 1)]).T
    roll_array=np.array([[np.append(df.values[i:i+w-1,0],df.values[i+w-1:i+w,1])] for i in range(len(df.index) - w + 1)])
    panel = pd.Panel(roll_array, 
                     items=df.index[w-1:],
                     major_axis=[df.columns[0]],
                     minor_axis=pd.Index(range(w), name='roll'))
    df_window=panel.to_frame().unstack().T.groupby(level=0, **kwargs)
    return df_window

For now it works,both with system function and customized function. But here are some problems: 1. The returned result lacks window sized data, which is different with the pandas rolling behavior. It requires inserting null data to fill missing index. 2. it seems the pannel is depreciated. 3. The running speed is no much difference compared to forloops I wrote.

I will try to dig into the pandas roll src code and fix this.

He Gui
  • 31
  • 4
  • Possible duplicate of [How to invoke pandas.rolling.apply with parameters from multiple column?](https://stackoverflow.com/questions/38878917/how-to-invoke-pandas-rolling-apply-with-parameters-from-multiple-column) – Chris Adams Apr 16 '19 at 10:35
  • see this answer https://stackoverflow.com/a/38879051/10201580 – Chris Adams Apr 16 '19 at 10:36
  • 1
    thx Chris A. I rewrote the method in the answer you posted. For now the function works. But the result will lack window sized data, and the index will be different from pandas rolling behavior. In the other hand, the speed is no big difference to for loops. I'll dig into how the padas roll works later and try find a better solution. – He Gui Apr 19 '19 at 09:29

0 Answers0