3

I am using the following to convert "yes", "no" responses into numeric data so that I may plot the results into a scatter plot.

> head(cust.df$email)
[1] "yes" "yes" "yes" "yes" "no"  "yes"

> as.numeric(head(cust.df$email))
[1] NA NA NA NA NA NA
Warning message:
NAs introduced by coercion 

As you can see, I get this warning message. The end result is that when I create the scatter plot, it is empty because of the NAs.

I have even tried to fix it with this method.

as.factor(head(cust.df$email))
yes yes yes yes no  yes
Levels: no yes

> as.numeric(head(cust.df$email))
[1] NA NA NA NA NA NA
Warning message:
NAs introduced by coercion

However, none of that has worked. Does anyone have any tips on how to solve this? The data does have 341 NAs.

lmo
  • 37,904
  • 9
  • 56
  • 69
Luis
  • 97
  • 2
  • 2
  • 10
  • 2
    It would be easier to help if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). What is `class(cust.df$email)`? Also, what values do you want for yes/no? 0/1? 1/0? 2/10? – MrFlick Jun 20 '17 at 23:17
  • Thank Mr. Flick, – Luis Jun 20 '17 at 23:22
  • It is a character. I want 2=Yes, 1=No – Luis Jun 20 '17 at 23:23
  • Your fix did not work because you converted to factor, but you did not assign the new factor values to the variable. You need to do something like `cust.df$email <- as.numeric(as.factor(cust.df$email))`. – neilfws Jun 20 '17 at 23:36
  • I see what you mean. I followed your instructions and now its works. Thank you very much. – Luis Jun 21 '17 at 02:33

3 Answers3

3

As far as I know, yes and no do not equate to 0 and 1 in R. It would work with TRUE and FALSE however. You need to assign a value to "yes" and "no" directly.

cust.df$email<-factor(cust.df$email)
cust.df$email<-as.numeric(cust.df$email)

this will assign 1 and 2 to your data, if you want 0 and 1, then you can simply use:

cust.df$email[cust.df$email==2]<-0

Agile Bean
  • 6,437
  • 1
  • 45
  • 53
sconfluentus
  • 4,693
  • 1
  • 21
  • 40
1

One possible way to handle it is with as.numeric(as.factor(email)) in your scatterplot. Here's an example that shows how it works:

stuff <- sample(c("yes","no",NA), 10, replace=T)
stuff
#   [1] "yes" "no"  "yes" NA    NA    "no"  "no"  "yes" "yes" "no" 

as.numeric(as.factor(stuff))
#   [1]  2  1  2 NA NA  1  1  2  2  1

The reason as.numeric(head(cust.df$email)) doesn't work is because you only displayed the factor representation of head(cust.df$email), you didn't convert cust.df$email to factor.

Another possible way is to create a new variable - this would be an easy way to use whatever numeric codes you'd like:

stuff_num <- rep(NA, length(stuff))
stuff_num[stuff=="yes"] <- 2
stuff_num[stuff=="no"] <- 1
stuff_num
#   [1]  2  1  2 NA NA  1  1  2  2  1
Matt Tyers
  • 2,125
  • 1
  • 14
  • 23
  • Yes, thank you for helping me see that. As a newbie to R, I appreciate your alternative solution. R is so fun and the learning curve isn't so bad. – Luis Jun 21 '17 at 02:34
1

I had this issue before and the problem was with the .csv file that I read from. The problem for me that the cells had "," like that 1,

When I removed it, it worked like a charm. Hope that may help anyone facing this issue in the future.

Salma Elshahawy
  • 1,112
  • 2
  • 11
  • 21