As a follow-up to my comment:
You could save the Rds files sequentially into tsv or csv file(s) (e.g. by using data.table::fwrite
; that would work for simple data structures, at least) - and either generate one big file by appending each file sequentially and then removing it from memory (e.g.using fwrite
with append=TRUE
), or saving one by one and concatenating them on the command line). If the resulting text file is still too large to fit into memory, you could then load it in chunks, use vroom
, etc., to get the data back into R.
Below is an example showing the idea:
library(data.table)
# generate 100 Rds files as examples
invisible(lapply(1:100,
\(x) saveRDS(data.frame(matrix(rnorm(1e5), ncol=5, nrow = 2e4,
dimnames=list(NULL, paste0("col", 1:5)))), sprintf("file%03d.Rds", x))))
# files to concatenate
files <- list.files(pattern="file.*.Rds")
# assuming all files have the same column names, you could retrieve it from first file
cn <- colnames(readRDS(file=files[1]))
fwrite(data.table(t(cn)), file="outfile.csv", col.names = FALSE)
# sequentially load Rds files, save appending to newly created output file
invisible(lapply(files,
\(x) {fwrite(readRDS(x), file="outfile.csv", col.names = FALSE, append=TRUE)}))
# open with vroom
library(vroom)
vroom("outfile.csv")
#> Rows: 2000000 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (5): col1, col2, col3, col4, col5
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 2,000,000 × 5
#> col1 col2 col3 col4 col5
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -1.08 -0.366 1.26 -0.791 1.37
#> 2 0.365 0.382 -0.742 -0.648 -0.800
#> 3 1.09 -0.618 0.480 1.64 0.155
#> 4 2.54 0.170 -0.654 0.537 0.140
#> 5 -0.331 0.262 0.156 0.360 -0.250
#> 6 -0.349 -0.00872 0.322 0.698 0.653
#> 7 0.353 -0.0634 1.28 -0.402 -1.54
#> 8 1.35 1.15 -1.05 0.410 -0.183
#> 9 -0.499 -3.07 1.14 -0.878 1.11
#> 10 0.479 1.30 0.718 1.17 -1.02
#> # … with 1,999,990 more rows
#> # ℹ Use `print(n = ...)` to see more rows
Created on 2022-07-23 by the reprex package (v2.0.1)