My R application reads input data from large txt files. it does not read the entire file in one shot. Users specify the name of the gene, (3 or 4 at a time) and based on the user-input, app goes to the appropriate row and reads the data.
File format: 32,000 rows (one gene per row, first two columns contain info about gene name, etc.) 35,000 columns with numerical data (decimal numbers).
I used read.table (filename, skip=10,000 ) etc. to go to the right row, then read 35,000 columns of data. then I do this again for the 2nd gene, 3rd gene (upto 4 genes max) and then process the numerical results.
The file reading operations take about 1.5 to 2.0 Minutes. I am experimenting with reading the entire file and then taking the data for the desired genes.
Any way to accelerate this? I can rewrite the gene data in another format (one time processing) if that will accelerate reading operations in the future.