I have a large number of csv data files, and each data file contains several days worth of tick data for one ticker in the following form :
ticker DD/MM/YYYY time bid ask
XXX, 19122014, 08:00:08.325, 9929.00,9933.00
XXX, 19122014, 08:00:08.523, 9924.00,9931.00
XXX, 19122014, 08:00:08.722, 9925.00,9930.50
XXX, 19122014, 08:00:08.921, 9924.00,9928.00
XXX, 19122014, 08:00:09.125, 9924.00,9928.00
…
XXX, 30122014, 21:56:25.181, 9795.50,9796.50
XXX, 30122014, 21:56:26.398, 9795.50,9796.50
XXX, 30122014, 21:56:26.598, 9795.50,9796.50
XXX, 30122014, 21:56:26.798, 9795.50,9796.50
XXX, 30122014, 21:56:28.896, 9795.50,9796.00
XXX, 30122014, 21:56:29.096, 9795.50,9796.50
XXX, 30122014, 21:56:29.296, 9795.50,9796.00
…
I need to extract any lines of data whose time is within certain range, say: 09:00:00 to 09:15:00. My current solution is simply reading in each data file to a data frame, sorting it in order by time and then using searchsorted to find 09:00:00 to 09:15:00. It works fine if performance isn't an issue and I don't have 1000 files waiting to be processed. Any suggestions on how to boost the speed? Thanks for help in advance!!!