How to load .dta (preserving labels) most comfortable in R?

Question

I work with .dta files and try to make loading data as comfortable as possible. In my view, I need a combination of haven and readstata13.

haven looks perfect. It provides best "sub-labels". But it does not provide a column-selector-function. I cannot use read_dta for large files ( ~ 1 GB / on 64 GB RAM, Intel Xeon E5). Question: Is there a way to select/load a subset of data?
read.dta13 is my best workaround. It has select.cols. But I have to get attr later, save and merge them (for about 10 files).

Question: How can I manually add these second labels which the haven package creates? (How are they called?)

Here is the MWE:

library(foreign)
write.dta(mtcars, "mtcars.dta")

library(haven)
mtcars <- read_dta("mtcars.dta")

library(readstata13)
mtcars2 <- read.dta13("mtcars.dta", convert.factors = FALSE, select.cols=(c("mpg", "cyl", "vs")))
var.labels <- attr(mtcars2,"var.labels")
data.key.mtcars2 <- data.frame(var.name=names(mtcars2),var.labels)

score 2 · Accepted Answer · answered Aug 29 '19 at 08:02

2

haven's development version supports selecting columns with the col_select argument:

library(haven) # devtools::install_github("tidyverse/haven")
mtcars <- read_dta("mtcars.dta", col_select = c(mpg, cyl, vs))

Alternatively; the column labels in RStudio's viewer are taken from the data frame's columns' "label" attribute. You can use a simple loop to assign them from the labels read by readstata13:

for (i in seq_along(mtcars2)) {
  attr(mtcars2[[i]], "label") <- var.labels[i]
}

View(mtcars2)

answered Aug 29 '19 at 08:02

Mikko Marttila

10,972
18
31

Thanks so much for the development. The `col_select` feature makes `haven` perfect, no more reason to use other packages. But the install of dev version didn't work for me, it says "Error: Failed to install 'haven' from GitHub: (converted from warning) installation of package ‘XX/haven_2.1.1.9000.tar.gz’ had non-zero exit status" – Marco Aug 29 '19 at 08:25
Ok, I had to delete a `00LOCK-haven` folder in my lib manually. Then it worked. Looking foward to the new release of haven. – Marco Aug 29 '19 at 09:23

How to load .dta (preserving labels) most comfortable in R?

1 Answers1