0

I am trying to merge three list using library(tidyverse)in R Studio in Mac. However, I get Error: vector memory exhausted (limit reached?).

In addition, I tried also using merge function, it still does not work. I followed different available suggestions such as R on MacOS Error: vector memory exhausted (limit reached?) and vector memory exhausted. This did not help. I am using Mac, OS Mojave 16 GB RAM. Is this happening because of merging large number of rows. Below is the code which I am running to merge lists. Please assist me with this.

MSP_Counts_Ensembl_Normalized <- read.csv(file = "./MSP_Counts_Ensembl_Normalized.csv", stringsAsFactors = FALSE, check.names = FALSE)
dim(MSP_Counts_Ensembl_Normalized)
1] 60639   115
class(MSP_Counts_Ensembl_Normalized)
[1] "data.frame"

Normalized_counts_trim_CSV <- read.csv(file = "./Normalized_counts_trim_CSV.csv", stringsAsFactors = FALSE, check.names = FALSE)

dim(Normalized_counts_trim_CSV)
[1] 32388    46
class(Normalized_counts_trim_CSV)
[1] "data.frame"


Normalized_counts_notrims_CSV <- read.csv(file = "./Normalized_counts_notrims_CSV.csv", stringsAsFactors = FALSE, check.names = FALSE)
dim(Normalized_counts_notrims_CSV)
[1] 52419    50
class(Normalized_counts_notrims_CSV)
[1] "data.frame"

library(tidyverse)
Combined_full_join_v1 <- list(MSP_Counts_Ensembl_Normalized, Normalized_counts_trim_CSV, Normalized_counts_notrims_CSV) %>% reduce(full_join, by = "hgnc_symbol")


sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] forcats_0.5.0   stringr_1.4.0   dplyr_1.0.2     purrr_0.3.4     readr_1.3.1     tidyr_1.1.1    
[7] tibble_3.0.3    ggplot2_3.3.2   tidyverse_1.3.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5       cellranger_1.1.0 pillar_1.4.6     compiler_4.0.0   dbplyr_1.4.4    
 [6] tools_4.0.0      jsonlite_1.7.0   lubridate_1.7.9  lifecycle_0.2.0  gtable_0.3.0    
[11] pkgconfig_2.0.3  rlang_0.4.7      reprex_0.3.0     cli_2.0.2        DBI_1.1.0       
[16] rstudioapi_0.11  haven_2.3.1      xfun_0.16        withr_2.2.0      xml2_1.3.2      
[21] httr_1.4.2       fs_1.5.0         generics_0.0.2   vctrs_0.3.2      hms_0.5.3       
[26] grid_4.0.0       tidyselect_1.1.0 glue_1.4.1       R6_2.4.1         fansi_0.4.1     
[31] readxl_1.3.1     modelr_0.1.8     blob_1.2.1       magrittr_1.5     backports_1.1.9 
[36] scales_1.1.1     ellipsis_0.3.1   rvest_0.3.6      assertthat_0.2.1 colorspace_1.4-1
[41] tinytex_0.25     stringi_1.4.6    munsell_0.5.0    broom_0.7.0      crayon_1.3.4    

Thank you,

Toufiq

  • 1
    That does not look that much data to be honest, have you tried: ` MSP_Counts_Ensembl_Normalized %>% full_join(Normalized_counts_trim_CSV, by = "hgnc_symbol") %>% full_join( Normalized_counts_notrims_CSV, by = "hgnc_symbol") ` ? – L Smeets Oct 19 '20 at 10:18
  • 1
    Are you perhaps doing an unintended cartesian join? – Roland Oct 19 '20 at 10:21
  • 1
    Yes, are you sure hgnc_symbol, is a unique identifier in each row? if not the full_joins might indeed explode to a massive dataset. – L Smeets Oct 19 '20 at 10:28
  • @L Smeets, Yes In ran this code too, but did not work. In addition also ran, merge function, firstly by combining two files, and then subsequently merging the third file with product of two files. It did not work. It shows the same memory Error: vector memory exhausted (limit reached?) – Mohammed Toufiq Oct 19 '20 at 11:00
  • @L Smeets, hgnc_symbol is not unique, but there are few duplicates. In addition, most of the hgnc_symbol are same across all three list. – Mohammed Toufiq Oct 19 '20 at 11:01

0 Answers0