1

I would need to replace the levels of multiple factors in one data frame, so they would be all unified. These are, for example, the levels in the one of those factors:

> levels(workco[,5])
 [1] " "                              "1"                              "2"                             
 [4] "kóko"                          "kesätyö"                      "Kesätyö kokoaika"            
 [7] "koko"                           "kokop"                          "kokop."                        
[10] "Kokopäivä"                    "kokopäiväinen"                "Kokopäiväinen"               
[13] "kokopäiväinen / osa-aikainen" "kokopäivänen"                 "kokp"                          
[16] "kokp."                          "Kokp."                          "osa-aik"                       
[19] "Osa-aik / Kokopäiv."           "osa-aik."                       "Osa-aik."                      
[22] "osa-aikainen"                   "Osa-aikainen"                   "osa-aikainen/kokopäiväinen"  
[25] "Osa/kokoaikainen"               "Osap."                  

Let's say I have 12 columns that are all factors, and these have different level names referring to the same meaning expressed differently: as you can see from the example, many of them show the same letters within the level names: koko, kok, kokop... There are three levels I want to obtain by unifying: kokop, osa and kes. Also the levels named after numbers 1 and 2 should be recoded into kokop and osa, respectively.

So far the things I have tried don't work out, I am afraid it's because I thinking in a more complicated way than it actually is: I have tried loops using the adist() function and also grep() separately, but I get find errors. For example:

code <- c("kok","osa","ma","kes",1,2," ")
list.names <- c("1", "2", "3", "4", "5", "6","7","8","9","10","11","12")
mylist <- vector("list", length(list.names))
names(mylist) <- list.names
D <- mylist
index <- mylist

for (i in ncol(workco2)){                            
  D[[i]] <- adist(workco2[,i],code,ignore.case=TRUE)
  index[[i]] <- lapply(D[[i]],which.min)
  workco2[,i] <- data.frame(code[index[[i]]])
}

And this error message:

Error in code[index[[i]]] : invalid subscript type 'list'

Could you be so kind to hint me how you would solve it? Probably is much simpler than I think =/ Thanks beforehand!

divibisan
  • 11,659
  • 11
  • 40
  • 58
Gina Zetkin
  • 333
  • 1
  • 5
  • 12
  • 2
    [Minimal reproducible example](http://stackoverflow.com/a/5963610/1412059) and expected output please. What should be done with mixed levels like `"kokopäiväinen / osa-aikainen"`? – Roland Feb 04 '15 at 13:14
  • Sorry Roland, just pasted the error message. The mixed levels should be coded as "osa", or "kes" in case either appears, being "kes" chosen if osa/kes show together. – Gina Zetkin Feb 04 '15 at 13:20
  • @Gina Zetkin. Did our answers help you? – Ruthger Righart Feb 05 '15 at 08:32

2 Answers2

1

I usually merge factors as demonstrated in the example below. I subset levels that correspond to my criterion (... %in% c(...)) and overwrite them with a new level.

set.seed(357)
xy <- data.frame(name = sample(letters[1:4], size = 20, replace = TRUE), value = runif(20))
xy$name
  [1] a a b a c b d c d d c c b a c a b d c b
  Levels: a b c d
levels(xy$name)[levels(xy$name) %in% c("a", "b")] <- "a-b"
levels(xy$name)[levels(xy$name) %in% c("c", "d")] <- "c-d"
xy$name
 [1] a-b a-b a-b a-b c-d a-b c-d c-d c-d c-d c-d c-d a-b a-b c-d a-b a-b c-d c-d a-b
Levels: a-b c-d
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
0

It is my guess that you need a combination of grep & replace. This may speed-up changing levels with similar syllables ("ko", "kok").

Data example

code <- as.factor(c("kok","osa","ma","kes", "koko", "osa-aikainen", "osa/kes"))

Add level

levels(code) <- c(levels(code), "kokop")

Replace all instances containing "kok" with "kokop"

new.code <- replace(code, (grep ("kok", code)), "kokop")

Replace all instances containing "osa/kes" with "kes"

new.code <- replace(code, (grep ("osa/kes", code)), "kes")

Use shorter strings, for ex. "ko", to change levels with similar syllables ("ko", "kok")

new.code <- replace(code, (grep ("ko", code)), "kokop")
Ruthger Righart
  • 4,799
  • 2
  • 28
  • 33