I have about ~1000 tar.gz files (about 2 GB/file compressed) each containing bunch of large .tsv (tab separated) files e.g. 1.tsv, 2.tsv, 3.tsv, 4.tsv etc.
I want to work in R on a subset of the .tsv files (say 1.tsv, 2.tsv) without extracting the .tar.gz files, in order to save space/time.
I tried looking around but couldn't find a library or a routine to stream the tar.gz files through memory and extracting data from them on the fly. In other languages there are ways of doing this efficiently. I would be surprised if one couldn't do this in R
Does anyone know of a way to accomplish this in R? Any help is greatly appreciated! Note: Unzipping/untarring the file is not an option. I want to extract relevant fields and save them in a data.frame without extracting the file