0

I have a question about the manipulation of a data frame. If I have this data frame as an example:

 employee <- c('John Doe','Peter Gynn','Jolie Hope')
 salary <- c(21000, 23400, 26800)
 startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
 location <- c('New York', 'Alabama','New York')
 employ.data <- data.frame(employee, salary, startdate, location)
 employ.data

        employee salary  startdate location
1   John Doe  21000 2010-11-01 New York
2 Peter Gynn  23400 2008-03-25  Alabama
3 Jolie Hope  26800 2007-03-14 New York

Now I want to transform the location into nummeric values. I know that I can do something like this:

     transformlocation <- function(x) {
     x <- as.character(x)

     if (x =='New York'){
         return('1')
     }else if (x=='Alabama'){
         return('2')
     }else if (x=='Florida'){
         return('3')
     }else
         return('0')
 }

employ.data$location <- sapply(employ.data$location, transformlocation)

employ.data
    employee salary  startdate location
1   John Doe  21000 2010-11-01        1
2 Peter Gynn  23400 2008-03-25        2
3 Jolie Hope  26800 2007-03-14        1

But in my final dataset there are hundreds of different values. For example, is it possible to work with a for each statement here?

Thanks for your help!

Timothy_Goodman
  • 393
  • 1
  • 5
  • 18

1 Answers1

1

If it is already a factor variable, then simply convert to integer,i.e.

employ.data$location <- as.integer(employ.data$location)
employ.data
#    employee salary  startdate location
#1   John Doe  21000 2010-11-01        2
#2 Peter Gynn  23400 2008-03-25        1
#3 Jolie Hope  26800 2007-03-14        2

Otherwise convert to factor and then integer, i.e.

employ.data$location <- as.integer(as.factor(employ.data$location))
Sotos
  • 51,121
  • 6
  • 32
  • 66