I have a big dataset (.tab file) of more then 30gb which my current pc can not open in R. Can I somehow load only n:m
rows of the file?
The point is that the data has about 20k columns but I need only a few of them. My idea is to load a subset of rows, let's say the first 100k, select only the relevant columns and save the data. Then I could open the next 100k rows, save them, and so on. All those created data files together will be smaller then the original dta file because I need only a few of the 20k columns. Thus, finally I can open all the created datasets and save them as one file. In order to do this I need to know how to load n:m
rows of a .tab file.
All I found so far is the nrows
argument in the read.table
function. But this expects only on number (it loads always 1:m rows).
Alternatively, it would be even easier if there was a way directly to open only the relevant columns. Unfortunately, I did not find a way to do so.