so I got this big tick data file (one day 60GB uncompressed) that I want to put into bcolz. I planned to read this file chunk by chunk and append them into bcolz.
As far as I know, bcolz only support append columns not rows. However, tick data is more row-wise than column-wise I would say. For instance:
0 ACTX.IV 0 13.6316 2016-09-26 03:45:00.846 ARCA 66
1 ACWF.IV 0 23.9702 2016-09-26 03:45:00.846 ARCA 66
2 ACWV.IV 0 76.4004 2016-09-26 03:45:00.846 ARCA 66
3 ALTY.IV 0 15.5851 2016-09-26 03:45:00.846 ARCA 66
4 AMLP.IV 0 12.5845 2016-09-26 03:45:00.846 ARCA 66
- Does anyone have any suggestions on how to do this?
- And is there any suggestion on compress level I should choose, when using bcolz. I'm more concerned about later query speed than size. (I'm asking this, coz as shown below, it seems level one compressed bcolz ctable actually has better query speed than uncompressed one. So my guess would be the query speed is not a monotonic function with compression level). reference: http://nbviewer.jupyter.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb