I am using QuickFix with python bindings, along with pandas for data management.
I have been dealing with this issue for awhile, and have not found any clear questions/answers relating to it on SO or Google. It relates to code efficiency and architecture in a low-latency environment.
I am recording financial data. It is extremely high frequency. During fast periods, (small-sized) messages arrive every 15 milliseconds or so. QuickFix passes each message to a message cracker that I wrote, which does the following:
- parses the message with
re
- convert datatype of each element of the message (about 8 of them in this case)
- update the values of a pandas data frame with the 8 elements
- open a
.csv
file on the local computer, append the line of 8 elements, close the file.
A pretty simple process, but multiply this by several hundred markets, and what happens is my computer is unable to deal. Anywhere between 2 - 100 times per day, the computer chokes, falls offline, and I lose about 20 seconds of data (c. 13,000 samples!)
I am presently looking at PyTables, to see if I can speed up my process. But I do not know enough about computer science to really get to the heart of the speed issue, and would appreciate some wisdom.
Is the problem the .csv
file? Can I use PyTables and HDF5 to speed things up? What would be the 'right' way of doing something like this?