1

I've found good tips about fast ways to import files into R, but I'm wondering if it is possible to import only a subset of a given file into a variable.

In my case, I have a file with 16 million rows saved as .rds (and also as .feather, as I was playing with the speed of both formats) and I'd like to import a subset of it (say, a few rows or a few columns) for initial analysis.

Is it possible? The readRDS() does not seem to accept any subsetting, while read_feather() does not seem to allow row selection (although you can specify the columns). Should I consider another data format?

Thiago
  • 121
  • 1
  • 10

2 Answers2

6

The short answer is 'no'. A nice alternative is the fst file format, which does allow the retrieval of a selection of columns and rows from a large dataset. More info here.

Thiago
  • 121
  • 1
  • 10
  • A bit late to the party but +1 for fst - I'm using it a lot now and the performance is pretty impressive. – Pascoe Sep 21 '20 at 13:29
0

Using readr::read_csv you could use n_max parameter and read as many rows as you like.

With readRDS, I suppose you could read the file dplyr::sample_n and then just erase it from memory with rm(object).

If you can not read the whole file into memory, you could use either sqlite, or another database, which is the prefered way, or you could try something along the line of readr::read_delim_chunked, which alows you to read a file in chunks, do something with the read chunk (like sample_n), delete the read chukc from memory and keep just the callback's result and go on like that until the file is over.

deann
  • 756
  • 9
  • 24
  • I believe `readr()` cannot import .rds or .feather files, so does not work for my case. With the `readRDS()` option, I'd need to import the whole file first, subset it and remove the big one, which is what I was trying to avoid in the first place. All in all, it seems that the answer is no, that's not possible to import a subset of a .rds or a .feather file. – Thiago Oct 24 '18 at 18:34
  • That is one of the reasons you need databases for – deann Oct 24 '18 at 18:36
  • I'll check your `sqlite` suggestion. Thank you. – Thiago Oct 24 '18 at 18:38