I work with .dta files and try to make loading data as comfortable as possible. In my view, I need a combination of haven
and readstata13
.
haven
looks perfect. It provides best "sub-labels". But it does not provide a column-selector-function. I cannot useread_dta
for large files ( ~ 1 GB / on 64 GB RAM, Intel Xeon E5).Question: Is there a way to select/load a subset of data?
read.dta13
is my best workaround. It hasselect.cols
. But I have to getattr
later, save and merge them (for about 10 files).Question: How can I manually add these second labels which the
haven
package creates? (How are they called?)
Here is the MWE:
library(foreign)
write.dta(mtcars, "mtcars.dta")
library(haven)
mtcars <- read_dta("mtcars.dta")
library(readstata13)
mtcars2 <- read.dta13("mtcars.dta", convert.factors = FALSE, select.cols=(c("mpg", "cyl", "vs")))
var.labels <- attr(mtcars2,"var.labels")
data.key.mtcars2 <- data.frame(var.name=names(mtcars2),var.labels)