4

I am now learning R, and I have problem with finding a command.

I have the categorical data

levels(job)
[1] "admin."        "blue-collar"   "entrepreneur"  "housemaid"    
[5] "management"    "retired"       "self-employed" "services"     
[9] "student"       "technician"    "unemployed"    "unknown"

now I want to simplify these levels, such as

levels(job) 
[1] "class1"  "class2" "class3" "unknown"

where type1 includes "admin.", "entrepreneur", and "self-employed"; type2 includes "blue-collar","management", and "technician"; type3 includes "housemaid", "student", "retired", and "services"; unknown includes "unknown" and "unemployed".

For this purpose, which command can I use? Thanks! Yan

nrussell
  • 18,382
  • 4
  • 47
  • 60
Yanyan
  • 189
  • 1
  • 4
  • 14

4 Answers4

11

You can assign to levels:

levels(z)[levels(z)%in%c("unemployed","unknown","self-employed")] <- "unknown"

This is covered in the help file -- type ?levels.


Stealing from @akrun's answer, you could do this most cleanly with a hash/list:

ha <- list(
  unknown = c("unemployed","unknown","self-employed"),
  class1  = c("admin.","management")
)

for (i in 1:length(ha)) levels(z)[levels(z)%in%ha[[i]]] <- names(ha)[i]
Frank
  • 66,179
  • 8
  • 96
  • 180
4

You may also create an 'key/value' index vector and use that to replace the elements in 'job'

indx <-  setNames(rep(c(paste0('type',1:3), 'unknown'), c(3,3,4,2)), 
      c(levels(job)[c(1,3,7)], levels(job)[c(2,5,10)], 
      levels(job)[c(4,6,8,9)], levels(job)[c(11,12)]))

factor(unname(indx[as.character(job)]))

data

v1 <- c('admin.', 'blue-collar', 'entrepreneur', 'housemaid',
'management', 'retired', 'self-employed', 'services', 'student', 
'technician', 'unemployed', 'unknown')
set.seed(24)
job <- factor(sample(v1, 50, replace=TRUE))
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Try the recode function from the car package.

(Posting as answer rather than comment, will delete if someone else posts a better answer)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
1

An alternative, base-r solution : create a character vector, change its values, factor() it.

job <- as.character(job)
job[job %in% c("admin.","entrepreneur","self-employed")] <- "class1"
... # do the same for the other classes
job <- factor(job)

Another solution is irec() in the package questionr. It opens a shiny app in your browser that allows interactive recoding, and then outputs the proper code in the console.

scoa
  • 19,359
  • 5
  • 65
  • 80