5

Suppose I have a column in a data frame as colors say c("Red", "Blue", "Blue", "Orange"). I would like to get it as c(1,2,2,3).

Red as 1
Blue as 2
Orange as 3

Is there a simpler way of doing this other than the obvious if/else or switch functions?

zx8754
  • 52,746
  • 12
  • 114
  • 209
freakyhat
  • 471
  • 1
  • 4
  • 11

4 Answers4

12

Set up a named vector, describing the link between colour and integers (i.e. specifically how the strings map to the integers):

colors=c(1,2,3)
names(colors)=c("Red", "Blue", "Orange")

Now use the named vector to generate a list of numbers associated with the colours in your data frame:

>colors[c("Red","Blue","Blue","Orange")]
   Red   Blue   Blue Orange 
     1      2      2      3 

UPDATE to address questions below. Here's an example of what I think you're trying to do.

dataframe=data.frame(Gender=c("F","F","M","F","F","M"))
strings=sort(unique(dataframe$Gender))
colors=1:length(strings)
names(colors)=strings
dataframe$Colours=colors[dataframe$Gender]

Can have a look at the result:

> dataframe
  Gender Colours
1      F      1
2      F      1
3      M      2
4      F      1
5      F      1
6      M      2

Note that this example assumes that you have no specific mapping between Gender and Colours in mind. If this is really the case, then it might be simpler to just follow the comment from @alexis_laz instead.

CnrL
  • 2,558
  • 21
  • 28
  • The thing is the number of rows in my data frame runs in the thousands – freakyhat Jul 20 '14 at 11:23
  • @user2500781. You could modify `CnrL` solution to setNames(1:3,unique(dat$colors))[dat$colors] Red Blue Blue Orange 1 2 2 3 – akrun Jul 20 '14 at 12:01
  • I don't see why this is a problem: thousands=sample(c("Red", "Blue", "Orange"),2000,replace=TRUE); colors[thousands] – CnrL Jul 20 '14 at 13:48
  • Maybe you need to clarify the question. Do you have thousands of unique strings or thousands of rows in your column of strings that you need to map to integers, or both? – CnrL Jul 20 '14 at 13:53
  • I'll try this out it sounds promising. And I have about a hundred unique strings in the column. – freakyhat Jul 20 '14 at 14:37
  • This might help: strings=sort(unique(dataframe$cols)); colors=1:length(strings); names(colors)=strings – CnrL Jul 20 '14 at 15:48
  • Hey. I like this approach but it just doesn't seem to work. Suppose my column has two unique values of gender - 'M' and 'F'. I need to change them to 1 and 2. strings=sort(unique(dataframe$Gender)); colors=1:length(strings); names(colors)=strings does not do the trick. The characters are not being replaced by the integers. – freakyhat Jul 21 '14 at 18:48
  • Brilliant. Made my code considerably faster. Thanks. – freakyhat Jul 23 '14 at 16:10
4

I must be missing something, but this method would work I believe. Having coerced your column with words (below, "names") to a factor, you revalue them by your numbers in "colors".

require(plyr)

colors <- c("1","2","3")
names <- c("Red", "Blue", "Orange")
df <- data.frame(names, colors)
df$names <- as.factor(df$names)
df$names <- revalue(x = df$names, c("Red" = 1, "Blue" = 2, "Orange" = 3))
lawyeR
  • 7,488
  • 5
  • 33
  • 63
1

Using car::recode() function:

library(car)

recode(x, "'Red'=1; 'Blue'=2; 'Orange'=3;")
# [1] 1 2 2 3
zx8754
  • 52,746
  • 12
  • 114
  • 209
0

Here is a function based on previous code:

# Recode 'string' into 'integer'
recode_str_int <- function(df, feature) {

  # 1. Unique values

  # 1.1. 'string' values
  list_str <- sort(unique(df[, feature]))

  # 1.2. 'integer' values
  list_int <- 1:length(list_str)

  # 2. Create new feature

  # 2.1. Names
  names(list_int) = list_str
  df$feature_new = list_int[df[, feature]]

  # 3. Result
  df$feature_new

} # recode_str_int

Call it like:

 df$new_feature <- recode_str_int(df, "feature")
Andrii
  • 2,843
  • 27
  • 33