2

We are working in Stata with data created in R, that have been exported using haven package. We stumbled upon an issue with variables that have a dot in the name. To replicate the problem, some minimal R code:

library("haven")
var.1 <- c(1,2,3)
var_2 <- c(1,2,3)
test_df <- employ.data <- data.frame(var.1, var_2)
str(test_df)
write_dta(test_df, "D:/test_df.dta")

Now, in Stata, when I do:

use "D:\test_df.dta"
d

First problem - I get an empty dataset. Second problem - we get variable name with a dot - which in Stata should be illegal. Therefore any command using directly the variable name like

drop var.1

returns an error:

factor variables and time-series operators not allowed
r(101);

What is causing such behaviour? Any solutions to this problem?

zx8754
  • 52,746
  • 12
  • 114
  • 209
radek
  • 7,240
  • 8
  • 58
  • 83

1 Answers1

4

This will drop var.1 in Stata:

drop var?1

Here (as in Excel), ? is used as a wildcard for a single character. (The regular expression equivalent to .)

Unfortunately, this will also drop var_1, if it exists.

I am not sure about the missing values when writing a .dta file with haven. I am able to replicate this result in Stata 14.1 and haven 0.2.0. However, using the read_dta function from haven,

temp2 <- read_dta("test_df.dta")

returns the data.frame. As an alternative to haven, I have used the readstata13 package in the past without issues.

library(readstata13)
save.dta13(test_df, "testdf.dta")

While this code has the same variable names issue, it provided a .dta file that contained the correct values when read into Stata 14.1. There is a convert.underscore argument to save.dta13, that is intended to remove non-valid characters in Stata variable names. I verified that it will work properly in this example for readstata13 for version 0.8.5, but had a bug in some earlier versions including version 0.8.2.

josliber
  • 43,891
  • 12
  • 98
  • 133
lmo
  • 37,904
  • 9
  • 56
  • 69
  • 2
    Thanks Imo. `drop` trick is indeed a good one, at least for this particular example. Can also confirm that `readstata13` creates dataset readable in Stata. – radek Sep 22 '16 at 07:51