0

I have a data.frame that is made up several hundreds factors. These factors have two wierd levels: "" and "true".

I must quite simply transform the levels into "0" and "1".

Problem is, I am unable to do it. I tried several solutions existing on stackoverflow, e.g. here but I am unable to do it.

The data is here

If I simply change the factor levels on the individual variables it works (of course)

export <- read.csv("factordata.csv")
str(export)
'data.frame':   16 obs. of  124 variables:
$ caddy1_prod_detail_2456            : Factor w/ 2 levels "","true": 2 1 1 1 1 1 1 1 1 1 ...

str(export$caddy1_prod_detail_2456)
#Factor w/ 2 levels "","true": 2 1 1 1 1 1 1 1 1 1 ...
levels(export$caddy1_prod_detail_2456)<-c(0,1)
str(export$caddy1_prod_detail_2456)
#Factor w/ 2 levels "0","1": 2 1 1 1 1 1 1 1 1 1 ...

But if I try to use a for loop, as here

for (var in export ) {
  levels(var)<-c("0","1")
}

simply nothing happens to the data; if instead I try the solution proposed in the other question, as in here

export[] <- lapply(export, function(x){
 levels(x) <- c(0,1)
 })

it transforms my factors in numeric and destroys existing data converting the variables to lists of 0 and 1.

str(export)
'data.frame':   16 obs. of  124 variables:
 $ caddy1_prod_detail_2456            : num  0 1 0 1 0 1 0 1 0 1 ...
 $ caddy1_prod_detail_2456_ingredients: num  0 1 0 1 0 1 0 1 0 1 ...

What am I doing wrong? Cannot be that difficult!

Community
  • 1
  • 1
PaoloCrosetto
  • 600
  • 1
  • 7
  • 16
  • This appears to work: `df <- data.frame(matrix(sample(c("","true"), 25, TRUE), ncol = 5)); df[] <- lapply(df, function(x) {levels(x) <- c(0,1); x})` – talat Nov 12 '15 at 13:11
  • Expanding what @docendodiscimus commented, maybe `df[]<-lapply(df, function(x) {y<-as.logical(x);y[is.na(y)]<-FALSE;factor(as.numeric(y))})` is safer if for instance the levels of some columns are reversed (e.g. `"true"` and `""`). – nicola Nov 12 '15 at 13:22
  • Dear @docendo, that solved the problem, thanks. My code lacked the 'x' at the end of the function. Why does one need it? Do functions in R always return the last argument given in absence of return() codes, and so one needs to tell the function to return 'x'? – PaoloCrosetto Nov 13 '15 at 14:12
  • Exactly, functions always return what is evaluated last unless there's an explicit return statement. – talat Nov 13 '15 at 14:39
  • Thanks, that's crucial to know. – PaoloCrosetto Nov 13 '15 at 20:43

0 Answers0