0

I was following along to this post here to figure out how to change my factor into a numeric value in R Studio. The factor in question does have NA's which I put in there myself. I need to use this factor in a tapply() code later and want to make sure that the NA's won't be a problem.

Example Code:

factor.1[2] <-NA
factor.1[7] <-NA
factor.1[12] <-NA

Then, following the directions on the linked post:

num.fact1 <- as.numeric(levels(factor.1))[factor.1]

The "error" I get is "NA's introduced by coercion". But it does let me proceed, regardless. Now, tapply:

tapply(
    num.fact1,
    factor.2,
    mean, na.rm=TRUE
)

I think the output looks fine/accurate. I want to make sure that the error I have with "NA's introduced by coercion" won't be a problem, especially when I knit this notebook to PDF.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

1 Answers1

1

Assigning the NA to the variable is harmless in this case. The cause of the warning, however, is more worrying. Look at this example:

factor.1 <- factor(c("5.6", "4.7", "10.1", "2.O", "3.6", "1.7"))
factor.1
# [1] 5.6  4.7  10.1 2.O  3.6  1.7 
# Levels: 1.7 10.1 2.O 3.6 4.7 5.6

They all look like numbers, right? Now do your conversion to numeric:

num.fact.1 <- as.numeric(levels(factor.1))[factor.1]
# Warning message:
# NAs introduced by coercion

The message is warning you that some of the data could not be converted to numeric, so NA results. Let's check which ones:

data.frame(factor.1, num.fact.1)[which(is.na(num.fact.1) & !is.na(factor.1)), ]
#   factor.1 num.fact.1
# 5      2.O         NA

The 5th data is "2.O" not 2.0. The data may need some cleaning.

Edward
  • 10,360
  • 2
  • 11
  • 26
  • This makes a lot of sense! Thanks for the clear cut example @Edward. As far as data cleansing goes, the data set I'm working with did have some cleaning work that needed to be addressed, namely the three points that I changed to NA's. I think once, I changed those three, the rest of the data should be fine, per the assignment. – snicksnackpaddywhack91 May 11 '20 at 01:23
  • @frazaga962 Be careful though. After you replace those three rows with NA, you still get the warning. Which means there are more rows with non-numeric data. Run my last command to find them. – Edward May 11 '20 at 01:28
  • I got a return of 0 rows. Would I still get the warning? – snicksnackpaddywhack91 May 11 '20 at 02:04
  • I would guess no. But do you? If you still get a warning, I must be missing something. But it's hard to know without looking at some of your data. If you edit your question and add the output of `factor.1` then I could check. – Edward May 11 '20 at 02:17