I am running into memory problems and need help to fix it .I cannot publish the exact code or results here due to confidentiality issues of my company. However, I have used dummy references as below
There are 2 data frames as below
data frame A looks like
id x_1 x_2 x_3 x_4
1 data data data data
2 data data data data
3 data data data data
data frame B looks like
id_1 x_1 x_2 x_3 x_4
1 data data data data
2 data data data data
3 data data data data
The hope was to get a combination result of the first columns of A and B as
id id_1
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
Thus, used expand.grid as :
myLoadedData1 <- expand.grid(A$id,B$id)
The expand.grid was working fine when both A and B data frames had 8000 records each.
Due to scalability that cannot be avoided, the records have now increased to 50000 in both data frames. Now we see the below issue
myLoadedData1 <- expand.grid(A$id,B$id)
Error: cannot allocate vector of size 7.1 Gb
Please help the project is sort of stuck now and need ideas to move past this . Please see my session info below
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] plyr_1.8.4 dplyr_0.7.7 odbc_1.1.6 data.table_1.11.8
loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 assertthat_0.2.0 R6_2.2.2 DBI_1.0.0 magrittr_1.5 pillar_1.2.3 rlang_0.2.1 blob_1.1.1
[9] bindrcpp_0.2.2 tools_3.5.1 bit64_0.9-7 glue_1.2.0 purrr_0.2.5 bit_1.1-14 hms_0.4.2 yaml_2.1.19
[17] compiler_3.5.1 pkgconfig_2.0.1 tidyselect_0.2.4 bindr_0.1.1 tibble_1.4.2