3

I am trying to concatenate a large character vector (2.8 Gb) with the following code:

x <- paste(v, collapse = "\n")

Error message:

Error in paste(v, collapse = "\n") : result would exceed 2^31-1 bytes

As I understand it, this is caused by a limit R imposes on individual objects. However, I have also read that R can support long vectors since R 3.0.0, but cannot figure out how. I have tried increasing the environment variable R_MAX_VSIZE=32000000000 (32 GB) without success.

I am running Microsoft R Open 3.5.1 with 64 GB RAM. My full sessionInfo():

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Swedish_Sweden.1252  LC_CTYPE=Swedish_Sweden.1252    LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C                    LC_TIME=Swedish_Sweden.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.3.0        stringr_1.3.1        dplyr_0.7.6          purrr_0.2.5          readr_1.2.0          tidyr_0.8.1          tibble_1.4.2         ggplot2_3.0.0        tidyverse_1.2.1     
[10] data.table_1.11.9    RevoUtils_11.0.1     RevoUtilsMath_11.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18     cellranger_1.1.0 pillar_1.3.0     compiler_3.5.1   plyr_1.8.4       bindr_0.1.1      tools_3.5.1      lubridate_1.7.4  jsonlite_1.5     nlme_3.1-137     gtable_0.2.0    
[12] lattice_0.20-35  pkgconfig_2.0.1  rlang_0.2.1      cli_1.0.0        rstudioapi_0.7   yaml_2.2.0       haven_1.1.2      bindrcpp_0.2.2   withr_2.1.2      xml2_1.2.0       httr_1.3.1      
[23] knitr_1.20       hms_0.4.2.9001   grid_3.5.1       tidyselect_0.2.4 glue_1.3.0       R6_2.2.2         readxl_1.1.0     modelr_0.1.2     magrittr_1.5     backports_1.1.2  scales_0.5.0    
[34] rvest_0.3.2      assertthat_0.2.0 colorspace_1.3-2 stringi_1.2.4    lazyeval_0.2.1   munsell_0.5.0    broom_0.5.0      crayon_1.3.4 

Related posts: here, here, here, and here.

Samuel
  • 2,895
  • 4
  • 30
  • 45
  • 1
    R may have long vectors, but when you are pasting like that, you are attempting to write one string (a character vector of length 1). So on the `?"Memory-limits"` help page it states "the number of bytes in a character string is limited to 2^31 - 1" – MrFlick Nov 02 '18 at 14:40
  • I see, the limit has to do with bytes in a single string, rather than the length of the vector? – Samuel Nov 02 '18 at 14:48
  • 1
    Yes. That's what this specific error is about. – MrFlick Nov 02 '18 at 14:48
  • `length(v)` gives 11989758, so that is approximately 12 Million, so that is not the issue I understand. – Samuel Nov 02 '18 at 14:50
  • 1
    Well, the `length(v)` of the vector doesn't add up all the bytes. `sum(nchar(v))` would tell you how many characters there are and then you need to add in all the newlines you are adding as well. – MrFlick Nov 02 '18 at 14:51
  • `sum(nchar(v))` gives 2296061986 which I think is approximately 2.3 Billion? – Samuel Nov 02 '18 at 14:54
  • 1
    Well, 2,296,061,986 > 2^31-1 so the error message is accurate. What are you even trying to do with this `paste()` command anyway? – MrFlick Nov 02 '18 at 14:55
  • 1
    Maybe keep transformation outside R, see [here (sed, awk, etc solutions)](https://unix.stackexchange.com/questions/169995), or keep inside R something like: `data.table::fread("tr -s ' ' '\n' myfile.txt")` – zx8754 Nov 02 '18 at 14:56
  • I have done a number of gsub() and then need to paste() it together in preparation to a fread(). So it's a cleaning process. Following an answer suggested [here](https://stackoverflow.com/a/52957587/5664891). – Samuel Nov 02 '18 at 14:58
  • 2
    Ah, so this is really a data.table problem. Unfortunately like most other reading functions, `fread` doesn't seem to accept a vector of lines. If you want to use `fread`, it looks like you are going to have to write that out to disk first, then read it again. You can remove it right after. But I agree with @zx8754, that i you are just going some gsubs you might be better off doing them outside R. – MrFlick Nov 02 '18 at 15:11

0 Answers0