0

I have a panel data including 10 stocks’ OCHL data from 2005-2022. I can do those thing in pandas conveniently:

1.get the daily rank of adjClose for each day.

2.get the moving average of ranks in 1

3.get the rank of the MA for each day.

But how can I done those tasks in Numpy for the quicker running?

def accelerate_from_df_to_array():
# -------------------------------------DataPrep:get data from MongoDB-------------------------------------
# set the namelist of stock
stockNameList = db_daily.list_collection_names()[:10]

# searching filter
rowFltDict = {}
colFltDict = {'_id': 0, 'date': 1, 'adjOpen': 1, 'adjHigh': 1, 'adjLow': 1, 'adjClose': 1, 'symbol': 1}

# to get the dataframe from DB
stockDf = pd.concat([pd.DataFrame(db_daily[x].find(rowFltDict, colFltDict)) for x in stockNameList],
                    ignore_index=True)
# ----------------------------------------------calculating from DataFrame---------------------
stockDf['adjClose_r'] = stockDf.groupby('date')['adjClose'].rank()
stockDf['adjClose_r_ma'] = stockDf.groupby('symbol')['adjClose_r'].apply(lambda x: x.rolling(60, min_periods=1).mean())
stockDf['adjClose_r_ma_r'] = stockDf.groupby('date')["adjClose_r_ma"].rank()

# -----------------------------------------------calculating from Numpy--------------------

the data structure of stockDf likes above,just for demo so there are only 2 rows.

symbol adjOpen adjHigh adjLow adjClose date adj_Close_r adj_Close_r_ma adj_Close_r_ma_r
300190 33.33 35 30.86 31.99 2021\12\27 6 6 6
000066 31.1 31.1 29.85 29.92 2005\08\29 3 3 3

I’ve tried to create the 3D arrays for groupby.but each matrix in this array is not in the same shape.thus when I want to do some group functions, it didn’t work.and the process of native loop in Python also slow down the runtime.

stock2DArray = stockDf.values
stock2DArray = stock2DArray[stock2DArray[:, 5].argsort()]
stock3DList = np.split(stock2DArray, np.unique(stock2DArray[:, 5], return_index=True)[1])
k = np.array(stock3DList)
rank = np.empty(0)
for i in range(k.shape[0]):
    temp = k[i]
    rank1 = np.argsort(np.argsort(-temp[:, 4])) + 1
    rank = np.concatenate([rank, rank1], axis=0)
ranked = np.column_stack((stock2DArray, rank))
yuan123
  • 1
  • 1

0 Answers0