R and Stata performing differently with large datasets

Asked Nov 28 '18 at 17:52

Active Nov 28 '18 at 17:52

Viewed 287 times

I have a dataset with 40,000,000 observations and 23 variables. It is written is Stata format (.dta) and 4.4 Gb large. Stata opens the file in circa 30 seconds, while R is not able to do it and reports the error term:

Error: cannot allocate vector of size 201.8 Mb

In R I have used the haven::read_dta function without any extra argument. The Windows file manager reports 30% RAM usage when the file is open in Stata, and 96% when R attempts to do the same.

Why this big discrepancy in performance between the two software?

I am using a machine with Windows 10 64-bit, 16gb RAM and an Intel i7 8th gen.

asked Nov 28 '18 at 17:52

Caserio

Try using `library(readstata13)`, another [SO post](https://stackoverflow.com/questions/38820594/r-how-to-quickly-read-large-dta-files-without-ram-limitations) references this as being a faster way to read large files. – Mako212 Nov 28 '18 at 17:58

R and Stata performing differently with large datasets

0 Answers0