I have a dataset with 40,000,000 observations and 23 variables. It is written is Stata format (.dta) and 4.4 Gb large. Stata opens the file in circa 30 seconds, while R is not able to do it and reports the error term:
Error: cannot allocate vector of size 201.8 Mb
In R I have used the haven::read_dta
function without any extra argument.
The Windows file manager reports 30% RAM usage when the file is open in Stata, and 96% when R attempts to do the same.
Why this big discrepancy in performance between the two software?
I am using a machine with Windows 10 64-bit, 16gb RAM and an Intel i7 8th gen.