3

Suppose I've read in a data frame, where a column contains strings as factors. I would like to convert the factors to numerics but with specific mappings. This conversion is typically a precursor step for a later calculation. For example:

> library(rpart)

> head(car90["Type"])
                 Type
Acura Integra   Small
Acura Legend   Medium
Audi 100       Medium
Audi 80       Compact
BMW 325i      Compact
BMW 535i       Medium

> summary(car90$Type)
Compact   Large  Medium   Small  Sporty     Van    NA's 
     19       7      26      22      21      10       6

In the car90$Type column, I would like to set 'Compact' to be -10, 'Large' to be -1, 'Medium' to be 0, 'Small' to be 1, 'Sporty' to be 10, and 'Van' to be 20, where the numbers are numerics, not factors. How would I do that?

I have already looked at related questions, but none provided a solution.

Replace specific column "words" into number or blank

Changing column names of a data frame in R

Replace contents of factor column in R dataframe

Convert factor to integer

Community
  • 1
  • 1
stackoverflowuser2010
  • 38,621
  • 48
  • 169
  • 217

6 Answers6

1

I would just use vector subscripting; here's an example:

R>a <- as.factor(c("C", "L", "M", "L", "C"))
R>a
[1] C L M L C
Levels: C L M
R>b <- c(C=-10,L=-1,M=0)
R>b
  C   L   M 
-10  -1   0 
R>
R>b[a]
  C   L   M   L   C 
-10  -1   0  -1 -10 
R>
Neal Fultz
  • 9,282
  • 1
  • 39
  • 60
1

you can try this

x <- c('Compact', 'Large', 'Medium', 'Small', 'Sporty', 'Van') 
y <-  factor(x, levels = c('Compact', 'Large', 'Medium', 'Small', 'Sporty', 'Van'), 
    labels = c(-10, -1, 0, 1, 10, 20))
as.numeric(as.character(y))


[1] -10  -1   0   1  10  20

For your case, you can call:

car90$Type <-  factor(car90$Type, levels = c('Compact', 'Large', 'Medium', 'Small', 'Sporty', 'Van'), 
    labels = c(-10, -1, 0, 1, 10, 20))
car90$Type <-  as.numeric(as.character(car90$Type))
Bangyou
  • 9,462
  • 16
  • 62
  • 94
  • I still need to save it back into the data frame, right? Would I do `car90$Type <- as.numeric(as.character(y))`? – stackoverflowuser2010 Feb 27 '14 at 00:59
  • Thanks. But car90$Type is already a factor. Is there any way to just assign labels in the first step instead of setting car90$Type as a new factor? – stackoverflowuser2010 Feb 27 '14 at 01:05
  • I will suggest to avoid direct assign label to a factor. It will be much safer and avoid lots of potential problem to create a new factor. You may need to check your results and call as.character to convert factor into character, then convert a new factor. – Bangyou Feb 27 '14 at 01:09
1

As @NealFultz notes, vector subscripting can achieve this. One must be careful though with how you do this operation though:

x <- car90$Type[1:10]
#[1] Small   Medium  Medium  Compact Compact Medium  Medium  Large   Large   <NA>
#Levels: Compact Large Medium Small Sporty Van

I.e.:

vals <- c(Compact=-10,Large=-1,Medium=0,Small=1,Sporty=10,Van=20)
vals[x]

Will give the correct result as the order in vals is the same as the levels in the factor x:

vals[x]
#  Small  Medium  Medium Compact Compact  Medium  Medium   Large   Large    <NA> 
#      1       0       0     -10     -10       0       0      -1      -1      NA 

This will fall over if you change the order in vals, e.g.:

vals <- c(Large=-1,Compact=-10,Medium=0,Small=1,Sporty=10,Van=20)
vals[x]
#  Small  Medium  Medium   Large   Large  Medium  Medium Compact Compact    <NA> 
#      1       0       0      -1      -1       0       0     -10     -10      NA 

You can get around this by subsetting based on comparing the character representation in x to the names of vals rather than the order, like:

vals <- c(Large=-1,Compact=-10,Medium=0,Small=1,Sporty=10,Van=20)
vals[as.character(x)]
#  Small  Medium  Medium Compact Compact  Medium  Medium   Large   Large    <NA> 
#      1       0       0     -10     -10       0       0      -1      -1      NA 
thelatemail
  • 91,185
  • 12
  • 128
  • 188
0

This is a join operation

encode <- data.frame(Type = c("Compact", "Large", "Medium", "Small", "Sporty", "Van"), TypeValue = c(-10,-1,0,1,10,20))

car90 <- merge(car90, encode, all.x = TRUE)

# or using dplyr
library(dplyr)
car90 <- left_join(car90, encode)
Hugh
  • 15,521
  • 12
  • 57
  • 100
0

Use merge() as in the following example.

First create a data frame with the values you want. In this scenario you would write

 dictionary <- data.frame(Type = c('Compact', 'Large', 'Medium', 'Small', 'Sporty', 'Van'),
                     Values = c(-10, -1, 0, 1, 10, 20))

 output <- merge(car90$Type, dictionary)

IMPORTANT: This example doesn't take NA into account. If you want to give those a value as well you'll need to include that as a type with its own value. Otherwise those rows won't be part of the output.

And the resulting data frame is formatted as you want it.

NOTE: It's easier if the columns are named exactly the same, but you can define the columns to be used with by.x and by.y check the documentation for more.

0

Just reset the levels:

levels(car90$Type) <- c(-10, -1, 0, 1, 10, 20)

Leads to (same head/subset as you):

#               Type
# Acura Integra    1
# Acura Legend     0
# Audi 100         0
# Audi 80        -10
# BMW 325i       -10
# BMW 535i         0

Though beware, if you intend to compute on this, you must then as.numeric(levels(fac))[fac] to make sure you compute on the numbers, not the underlying factor integer values.

BrodieG
  • 51,669
  • 9
  • 93
  • 146