0

I have a dataset where I am trying to convert a factor into a numeric variable, it appeared to work fine the first time I ran it but now I have changed the vector contents the as.numeric() function is returning different (possibly previous) values rather the values now in the vector, despite the fact that these do not appear to be stored anywhere. It works fine if I convert to a character first, however. The code I am using is:

rm(reprex) # ensure does not exist from previously
reprex <- data.frame(rbind(c("BT",8),c("BL", 1), c("TS",1), c("SA", 7), c("S", 5), c("LS",5), c("M",3), c("CV",3), c("CF",3), c("PE",3)))

names(reprex) <-c("Post Area", "Count")
reprex$Countnum <- as.numeric(reprex$Count) # should be same as Count
reprex$Countnum_char <- as.numeric(as.character(reprex$Count)) # is same as Count

head(reprex)

gives:

  Post Area Count Countnum Countnum_char
1        BT     8        5             8
2        BL     1        1             1
3        TS     1        1             1
4        SA     7        4             7
5         S     5        3             5
6        LS     5        3             5

Why is this? It seems to work if I convert it to a character before converting to numeric so I can avoid it, but I am confused about why this happens at all and where the strangely-mapped (I suspect from a previous version of the dataframe) factor levels are being stored such that they persist after I remove the object.

Mel
  • 700
  • 6
  • 31

2 Answers2

2

This question deals with how R understands your process. Count = 1 is the smallest number and so this become Countnum = 1. Count = 3 is the second highest number so the factor level is 2, which also means that the Countnum = 2, and so on and so forth. In effect, what your first as.numeric does is takes the factor level and converts the factor level to a number. The Countnum_char takes the character value (e.g. Count = 8 is factor level = 5 or Count = 5 is factor level = 3) as its value and converts the value to a number, not the factor level.

akash87
  • 3,876
  • 3
  • 14
  • 30
  • Thanks! I was confused because by chance it happened that the previous character values and the previous factor levels were the same, so I didn't notice this did not work until the contents were changed! – Mel Jan 24 '20 at 15:10
0

Take a look here to shed some light on the why this is happening: https://www.dummies.com/programming/r/how-to-convert-a-factor-in-r/

The Dummies website has a lot of good free resources on R.

> numbers <- factor(c(9, 8, 10, 8, 9))

If you run str() on the above code snippet you get this output:

> str(numbers) Factor w/ 3 levels "8","9","10": 2 1 3 1 2

R stores the values as c(2, 1, 3, 1, 2) with associated levels of c(“8”, “9”, “10”)

When converting numbers to character vectors you receive the expected output:

> as.character(numbers) [1] "9" "8" "10" "8" "9"

However when you use as.numeric() you will get the output of the internal level representation of the vector, and not the original values.

Doing what you did

> as.numeric(as.character(numbers)) [1] 9 8 10 8 9

Is exactly how you fix this! This is normal behavior for R when doing what you are doing; you've not made any mistakes here that I can see.