1

Dear StackOverflow Users,

R treating a particular data sets as non-numeric, a fairly normal problem:

df

 trial   count
 1       0.75   
 2       .
 3       0.90
 4       0.80

So I removed the . trials with the subset command:

 df <- subset(df, count != '.')

Which provides the following output:

 trial   count
 1       0.75   
 3       0.90
 4       0.80

I want to calculate the mean average of count so I do the following,

mean(as.numeric(df$count))

But for some reason instead of getting mean of all the values (.816), I get the mean of the rank order values (2).

I have never come across this problem, and though I can think of plenty of work arounds, I was wondering if anyone knew why this was happening?

Thank you for your time and consideration,

BC

B C
  • 318
  • 3
  • 16

1 Answers1

2

The issue is that . changes the column type from numeric to character (or factor). In this case, it seems to be factor. We need to convert to character first and then to numeric.

mean(as.numeric(as.character(df$count)))

Otherwise, what we get is the integer storage values of factor by directly coercing to numeric. E.g.

set.seed(24)
v1 <- factor(sample(c(7, 19, 5, 3, 20), 20, replace = TRUE))
as.integer(v1)
#[1] 4 4 1 2 1 5 4 1 5 4 1 4 1 1 4 5 3 3 2 3
as.numeric(as.character(v1))
#[1] 19 19  3  5  3 20 19  3 20 19  3 19  3  3 19 20  7  7  5  7
akrun
  • 874,273
  • 37
  • 540
  • 662