1

I have a dataset with 40,000,000 observations and 23 variables. It is written is Stata format (.dta) and 4.4 Gb large. Stata opens the file in circa 30 seconds, while R is not able to do it and reports the error term:

Error: cannot allocate vector of size 201.8 Mb

In R I have used the haven::read_dta function without any extra argument. The Windows file manager reports 30% RAM usage when the file is open in Stata, and 96% when R attempts to do the same.

Why this big discrepancy in performance between the two software?

I am using a machine with Windows 10 64-bit, 16gb RAM and an Intel i7 8th gen.

Caserio
  • 472
  • 1
  • 3
  • 14
  • Try using `library(readstata13)`, another [SO post](https://stackoverflow.com/questions/38820594/r-how-to-quickly-read-large-dta-files-without-ram-limitations) references this as being a faster way to read large files. – Mako212 Nov 28 '18 at 17:58

0 Answers0