1

I am importing a large dataset and occasionally, either when importing or data manipulating, I will receive a segfault about the memory. Sometimes I don't get the error, and sometimes I do. Here is an example of the error I receive when I try to download the data:

 *** caught segfault ***
address 0x5feb575d0a08, cause 'memory not mapped'

Traceback:
 1: scan(file = file, what = what, sep = sep, quote = quote, dec = dec,     nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE,     fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip,     multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes,     flush = flush, encoding = encoding, skipNul = skipNul)
 2: read.table(file = file, header = header, sep = sep, quote = quote,     dec = dec, fill = fill, comment.char = comment.char, ...)
 3: read.csv("Historical_DOB_Permit_Issuance.csv")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

Even if the data import is successful, I get a similar error when I try to merge the dataset with another dataset. Obviously it's a memory issue, but I'm not sure how to deal with it. Here is my session info:

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Can anyone please help?

r2evans
  • 141,215
  • 6
  • 77
  • 149
John M
  • 13
  • 6
  • Can you post a minimal code sample that reproduces the error, in particular what third party R packages are you using, and more of the stack trace? Sometimes this sort of problem is caused by compiler incompatibilities between your R install and third party packages e.g see https://stackoverflow.com/questions/49190251/caught-segfault-memory-not-mapped-error-in-r But there isn't yet enough information to tell for certain if that is the problem here. – paisanco Oct 03 '20 at 16:24
  • Here is one I occasionally get during the data import. The csv is saved on a local directory here I am calling "filepath". The full traceback is the one in the original post: setwd("filepath") dob_permits<-read.csv("Historical_DOB_Permit_Issuance.csv") – John M Oct 03 '20 at 17:21
  • I've never see a stack trace like that when trying to read a CSV. (1) How big is the file? Bytes and number of lines. (2) If it's one row (or some rows) that is/are causing the specific problem, it would be useful to narrow down what is doing it. Consider iterating through reading it with `read.csv(..., skip=, nrows=)`, going 100 or 1000 or 10K rows at a time until you can reproduce the crash, then start again at the same `skip=` value and much smaller `nrows=`. – r2evans Oct 03 '20 at 21:06
  • The thing is, sometimes it will read in fine. But then it will crash again when I try a data merge with another dataset. It's a large dataset (1,253,455,152 bytes and several hundred thousand rows). – John M Oct 03 '20 at 22:06

0 Answers0