I have a panel data including 10 stocks’ OCHL data from 2005-2022. I can do those thing in pandas conveniently:
1.get the daily rank of adjClose for each day.
2.get the moving average of ranks in 1
3.get the rank of the MA for each day.
But how can I done those tasks in Numpy for the quicker running?
def accelerate_from_df_to_array():
# -------------------------------------DataPrep:get data from MongoDB-------------------------------------
# set the namelist of stock
stockNameList = db_daily.list_collection_names()[:10]
# searching filter
rowFltDict = {}
colFltDict = {'_id': 0, 'date': 1, 'adjOpen': 1, 'adjHigh': 1, 'adjLow': 1, 'adjClose': 1, 'symbol': 1}
# to get the dataframe from DB
stockDf = pd.concat([pd.DataFrame(db_daily[x].find(rowFltDict, colFltDict)) for x in stockNameList],
ignore_index=True)
# ----------------------------------------------calculating from DataFrame---------------------
stockDf['adjClose_r'] = stockDf.groupby('date')['adjClose'].rank()
stockDf['adjClose_r_ma'] = stockDf.groupby('symbol')['adjClose_r'].apply(lambda x: x.rolling(60, min_periods=1).mean())
stockDf['adjClose_r_ma_r'] = stockDf.groupby('date')["adjClose_r_ma"].rank()
# -----------------------------------------------calculating from Numpy--------------------
the data structure of stockDf likes above,just for demo so there are only 2 rows.
symbol | adjOpen | adjHigh | adjLow | adjClose | date | adj_Close_r | adj_Close_r_ma | adj_Close_r_ma_r |
---|---|---|---|---|---|---|---|---|
300190 | 33.33 | 35 | 30.86 | 31.99 | 2021\12\27 | 6 | 6 | 6 |
000066 | 31.1 | 31.1 | 29.85 | 29.92 | 2005\08\29 | 3 | 3 | 3 |
I’ve tried to create the 3D arrays for groupby.but each matrix in this array is not in the same shape.thus when I want to do some group functions, it didn’t work.and the process of native loop in Python also slow down the runtime.
stock2DArray = stockDf.values
stock2DArray = stock2DArray[stock2DArray[:, 5].argsort()]
stock3DList = np.split(stock2DArray, np.unique(stock2DArray[:, 5], return_index=True)[1])
k = np.array(stock3DList)
rank = np.empty(0)
for i in range(k.shape[0]):
temp = k[i]
rank1 = np.argsort(np.argsort(-temp[:, 4])) + 1
rank = np.concatenate([rank, rank1], axis=0)
ranked = np.column_stack((stock2DArray, rank))