I am trying for hours to come up with the most efficient approach to structure and append flowing tick data to a shared memory numpy array and later get a pandas DataFrame in a timely fashion.
#source tick data comes in as dict
tick_data = {"bid": float(1.2), "ask": float(1.3), "time": datetime.datetime.now()}
#construct np array
dtype_conf = [('bid', '<f4'), ('ask', '<f4'), ('time', 'datetime64[us]')]
new_tick = np.array([(11.11, 22.22, now)], dtype=dtype_conf)
#append / vstack / .. it to existing shared numpy array
shared_np_array = np.vstack((shared_np_array, new_tick))
#fast construction of pd.DataFrame if needed
pd.DataFrame(shared_np_array.reshape((1,-1))[0])
Questions:
1) What is the right way to structure my array and (faster) append new tick data to it?
2) What would be the most efficient approach to create either a pd.DataFrame of the complete array or a pd.Series for a column?
3) Is there a better way to work with shared memory timeseries in python (besides multiprocessing.basemanager)?
Many thanks!