How to do rolling and group by in numpy to accelerate the process?

Question

I have a panel data including 10 stocks’ OCHL data from 2005-2022. I can do those thing in pandas conveniently:

1.get the daily rank of adjClose for each day.

2.get the moving average of ranks in 1

3.get the rank of the MA for each day.

But how can I done those tasks in Numpy for the quicker running?

def accelerate_from_df_to_array():
# -------------------------------------DataPrep：get data from MongoDB-------------------------------------
# set the namelist of stock
stockNameList = db_daily.list_collection_names()[:10]

# searching filter
rowFltDict = {}
colFltDict = {'_id': 0, 'date': 1, 'adjOpen': 1, 'adjHigh': 1, 'adjLow': 1, 'adjClose': 1, 'symbol': 1}

# to get the dataframe from DB
stockDf = pd.concat([pd.DataFrame(db_daily[x].find(rowFltDict, colFltDict)) for x in stockNameList],
                    ignore_index=True)
# ----------------------------------------------calculating from DataFrame---------------------
stockDf['adjClose_r'] = stockDf.groupby('date')['adjClose'].rank()
stockDf['adjClose_r_ma'] = stockDf.groupby('symbol')['adjClose_r'].apply(lambda x: x.rolling(60, min_periods=1).mean())
stockDf['adjClose_r_ma_r'] = stockDf.groupby('date')["adjClose_r_ma"].rank()

# -----------------------------------------------calculating from Numpy--------------------

the data structure of stockDf likes above,just for demo so there are only 2 rows.

symbol	adjOpen	adjHigh	adjLow	adjClose	date	adj_Close_r	adj_Close_r_ma	adj_Close_r_ma_r
300190	33.33	35	30.86	31.99	2021\12\27	6	6	6
000066	31.1	31.1	29.85	29.92	2005\08\29	3	3	3

I’ve tried to create the 3D arrays for groupby.but each matrix in this array is not in the same shape.thus when I want to do some group functions, it didn’t work.and the process of native loop in Python also slow down the runtime.

stock2DArray = stockDf.values
stock2DArray = stock2DArray[stock2DArray[:, 5].argsort()]
stock3DList = np.split(stock2DArray, np.unique(stock2DArray[:, 5], return_index=True)[1])
k = np.array(stock3DList)
rank = np.empty(0)
for i in range(k.shape[0]):
    temp = k[i]
    rank1 = np.argsort(np.argsort(-temp[:, 4])) + 1
    rank = np.concatenate([rank, rank1], axis=0)
ranked = np.column_stack((stock2DArray, rank))

Can you post some sample comment ? – sayan dasgupta Nov 08 '22 at 06:55 — sayan dasgupta, Nov 08 '22 at 06:55

How to do rolling and group by in numpy to accelerate the process?

0 Answers0