9

The variables in my database are coded as "Yes" and "No" but I would like to have as "1" and "2".

I tried to create a new variable using ifelse but when I list'ed it, it didn't work, as follows:

CA <- ifelse((CANCER == "Yes"),1
ifelse(( CANCER == "No"),2 )))

list(CA)

[[1]]
NULL
  • 2
    Why? I always prefer more informative labels like Yes and No; they're more informative and so I'm able to remember what they mean when I have to revisit an analysis six months later. I'm not aware of any advantage of switching to numeric codes, save perhaps file size. – Aaron left Stack Overflow Aug 20 '12 at 01:39
  • 3
    Because you might want to do numeric evaluations on the data. – thegreenpizza May 04 '17 at 22:24
  • @Aaron Who cares? Every programmer has their own preferences and programming needs. – user3932000 Jul 10 '19 at 20:10
  • There's a common misconception out there (due to old versions of SPSS, among other things), that to do analysis on a data set, the data, even categorical variables, must be stored as numeric values. This is not true with R, or any modern statistical software that I know of. I asked the question to get more information about their underlying needs, as I was worried this was an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). – Aaron left Stack Overflow Jul 10 '19 at 20:42

7 Answers7

19

If you want to use ifelse,

CA <- ifelse(CANCER=="Yes", 1, 2)
smillig
  • 5,073
  • 6
  • 36
  • 46
7

Assuming

levels(CANCER) 

returns

Levels: yes no

it's probably easiest in your case if you just say

CA<-factor(as.numeric(CANCER))

However, generally you can also use

Cancer<-factor(CANCER)

Than assuming

levels(Cancer) 

returns

Levels: yes no

You can change the levels thus

levels(Cancer)[1]<-"1"
levels(Cancer)[2]<-"2"

or switch labels accordingly.

Momo
  • 171
  • 7
2

We need to know if your variable is a factor. Suppose

foo <- c("yes","no","no","yes")

If is.factor(foo) returns TRUE, e.g., if you did foo <- factor(foo), then use

levels(foo) <- c("2", "1")

else use

foo[foo == "yes"] <- 1
foo[foo == "no"] <- 2

Also, list() doesn't do what (I think) you think it does. If you want to view the value of foo, just type in foo. After executing the code above...

foo
[1] 1 2 2 1
Jack Tanner
  • 934
  • 1
  • 8
  • 24
  • I'm not sure that's clever: is.factor(foo) FALSE is.character(foo) TRUE – Momo Aug 19 '12 at 23:38
  • @Momo, you're right, we don't know if the OP's variable is a factor or not. I adjusted my answer. – Jack Tanner Aug 20 '12 at 02:26
  • Great, but I only meant the result of relabeling should be a factor rather than a character vector. – Momo Aug 20 '12 at 02:35
  • @Momo, I disagree, actually - the result of relabeling should be a factor iff the original vector was a factor. No need to change data types as a side effect of relabeling. – Jack Tanner Aug 20 '12 at 02:41
  • @Aaron, no, if `foo` is a factor, then `foo[foo == "yes"] <- 1` will fail. – Jack Tanner Aug 20 '12 at 02:43
  • Thanks to all for the discussion. I'm working on it. I tried to change it since I'm going to use later for another variable. Ej table(death[Cancer==1],hospital, mean, na.rm=T) –  Aug 20 '12 at 03:36
2

I just wanted to add (as it helped me when I first came across this type of thing) that unless explicitly stated, the levels of a factor are ordered alphabetically.

This wouldn't work for this particular question as the factor levels in the CANCER column are explicitly ordered. BUT, for any "yes" / "no" coded dataset that is read using

my_df <- read.csv(file = "myfile.csv", stringsAsFactors = TRUE)

"no" would be coded as 1 and "yes" would be coded as 2, as N comes before Y in the alphabet.

Thus, in such a setting:

my_df$CANCER <- as.numeric(my_df$CANCER) - 1

Would be very useful as now every "no" is a 0 and every "yes" is a 1

Hope that is helpful to someone out there.

Hlynur
  • 335
  • 2
  • 7
1

If you coerce to a factor with the levels set in the order "yes","no":

foo <- factor(c("yes","no","no","yes"),levels=c("yes","no"))

You can simply coerce to numeric:

as.numeric(foo)

Which gives you:

[1] 1 2 2 1
Sacha Epskamp
  • 46,463
  • 20
  • 113
  • 131
0

Alternatively - and potentially more flexible if you have more than 2 options - you can use the merge() function.

For example if you have this data frame:

dtf <- data.frame(CANCER = c("No", "Yes", "Yes", "No"),
                  x = c(4, 5, 6, 7), 
                  # Keep character variables as characters, do not create factors
                  stringsAsFactors = FALSE) 

You can store the new way to code the value in another data frame :

moreinfo <- data.frame(CANCER = c("Yes", "No"),
                       CA = c(1, 2), 
                       stringsAsFactors = FALSE)

Then merge it with the original data frame:

merge(dtf, moreinfo, by = "CANCER")

  CANCER x CA
1     No 4  2
2     No 7  2
3    Yes 5  1
4    Yes 6  1

Note: sorry about the stringsAsFactors parameter, it's necessary to prevent R from automatically creating factors with your character variables, I recommend using it also when you load data with read.csv().

Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110
0

A little late to the party, but you can do this with dplyr mutate and revalue

cancer_dat <- data.frame(CANCER = c("No", "Yes", "Yes", "No"),
                  x = c(4, 5, 6, 7))

cancer_dat =cancer_dat %>% mutate(CANCER = revalue(factor(cancer_dat$CANCER),
                                       c("Yes" = "1", "No" = "0")))

revalue allows you to change the factor name to something else in the form of "old name","new name"

RayVelcoro
  • 524
  • 6
  • 21