Removing a custom (second) class from a dataset/variable

Question

I have been using a class from the hmisc package called haven_labelled (or sometimes just labelled). It purpose is to import the column labels from a Stata .dta dataset. When trying to use plm on a dataframe I got the error:

Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class ‘c("pseries", "haven_labelled")’ to a data.frame

Classes are as follows:

> class(actualdataset)
[1] "pdata.frame" "data.frame"
> class(actualdataset$examplevar)
[1] "pseries"        "haven_labelled"

As a results I would like remove the haven_labelled class from this database. I have regretfully been unable to recreate the error. I think it has to do with the var from my actualdataset being of a double class which includes have haven_labelled. Please see the following example dataset.

library(data.table)
library(plm)
library(Hmisc)
set.seed(1)
DT <- data.table(panelID = sample(50,50),                                                    # Creates a panel ID
                      Country = c(rep("A",30),rep("B",50), rep("C",20)),       
                      some_NA = sample(0:5, 6),                                             
                      some_NA_factor = sample(0:5, 6),         
                      Group = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)),
                      Time = rep(seq(as.Date("2010-01-03"), length=20, by="1 month") - 1,5),
                      norm = round(runif(100)/10,2),
                      Income = sample(100,100),
                      Happiness = sample(10,10),
                      Sex = round(rnorm(10,0.75,0.3),2),
                      Age = round(rnorm(10,0.75,0.3),2),
                      Educ = round(rnorm(10,0.75,0.3),2))           
DT [, uniqueID := .I]                                                                        # Creates a unique ID     
DT[DT == 0] <- NA                                                                            # https://stackoverflow.com/questions/11036989/replace-all-0-values-to-na
DT$some_NA_factor <- factor(DT$some_NA_factor)
labels <- data.table::fread("Varcode Variables
                         panelID a
                         Country b
                         Group c
                         Time d
                         norm e
                         Income f
                         Happiness g
                         Sex h
                         Age i
                         Educ j
                         uniqueID k                         
                         ", header = TRUE)
for (i in seq_len(ncol(DT))) { 
    label(DT[[i]]) <-  labels$Variables[match(names(DT)[i], labels$Varcode)] 
 }
DTp <- plm::pdata.frame(DT, index= c("panelID", "Time"))
result <- plm(Happiness ~ Income, data=DTp, model="within")

> class(DTp)
[1] "pdata.frame" "data.frame"
> class(DTp$Income)
[1] "pseries"  "labelled" "integer"

Any suggestions?

EDIT: I was thinking about something as follows:

for for (i in seq_len(ncol(DT)) {
    if (sapply(DT, function(x) class(x)[1L]) == "haven_labelled") { 
        attr(DT[,i],"class[1L]") <- "integer"
    }
 }

EDIT 2: The answer prevented any errors when applying plm. Regretfully somehow all coefficients, standard errors are zero. P-values and t-values are NA. I am not sure what causes this.

What happens if you run: `class(DTp$Income) <- "pseries" ` ? — Ben Nutzer, Aug 18 '19 at 09:12
Check `attributes(DTp$Income)`, `attributes(DTp$Income)$class` and `attr(DTp$Income,"class")`, in this case `attr(DTp$Income,"class") <- c("pseries","integer")` may help. — A. Suliman, Aug 18 '19 at 09:18
@A.Suliman Thank you for your comment. I was looking for a bit more general solution that I could used on the entire (actual) dataset. I edited the original post to explain better what I was hoping for. Would you mind taking a look? — Tom, Aug 19 '19 at 08:53
@Ben Nutzer Thank you for your comment. I tried your approach with the whole dataset, but turned it into a pseries list. — Tom, Aug 19 '19 at 09:13

score 1 · Accepted Answer · answered Aug 19 '19 at 09:46

1

This solution based on the provided dataset DTp, change labelled and labelled_ch according to your original dataset

for (i in seq_len(ncol(DTp))) {
  if (any(class(DTp[,i]) == "labelled")) {
    #browser()
    ind = which(class(DTp[,i])=="labelled")
    attr(DTp[,i],"class")[ind] <- "labelled_ch"
  }
}

answered Aug 19 '19 at 09:46

A. Suliman

12,923
5
24
37

Thank you so much! I got passed the `plm` stage without error now. – Tom Aug 19 '19 at 10:18
See the second edit. I am not sure it is related to the original questions that is why I removed it. The `lm` still works.. – Tom Aug 19 '19 at 10:58
Ah, good to know! With the example dataset I am however not really sure that it is that surprising, as I did not put any thought into the data. I have however now been able to get a result with `plm` and the actual dataset. I think I have to be more careful with what I put into the regression. Thanks a lot for all your help! – Tom Aug 19 '19 at 11:11

Removing a custom (second) class from a dataset/variable

1 Answers1