5

Below is dataframe df1 of which I want to convert column "V2" from factor format to numeric without changing the current values (0 ; 0 ; 8,5 ; 3).

df1=

             V1  V2 V3       X2 X3
4470 2010-03-28   0  A 21.53675  0
4471 2010-03-29   0  A 19.21611  0
4472 2010-03-30 8,5  A 21.54541  0
4473 2010-03-31   3  A       NA NA

Since column "V2" is in factor format I first convert it to character format: df1[,2]=as.character(df1[,2])

Then I try to convert "V2" to numeric format:

df1[,2]=as.numeric(df1[,2])

Leading to this R message:

Warning message: NAs introduced by coercion

And the dataframe below where df[3,2] has changed into "NA" instead of remaining "8,5"..

             V1 V2 V3       X2 X3
4470 2010-03-28  0  A 21.53675  0
4471 2010-03-29  0  A 19.21611  0
4472 2010-03-30 NA  A 21.54541  0
4473 2010-03-31  3  A       NA NA 

It might have to do with the fact that 8,5 is not a whole number. Still I do not know how to solve this problem. Help would be much appreciated!

tonytonov
  • 25,060
  • 16
  • 82
  • 98
MB123
  • 501
  • 2
  • 6
  • 12
  • 1
    The problem here is that your decimal separation is with a comma instead of a point. – juba May 02 '13 at 10:20

3 Answers3

11

Try this to replace the comma in your data:

fac<- c( "0" , "0" , "1,5" , "0" , "0" , "8" )
#[1] "0"   "0"   "1,5" "0"   "0"   "8" 
fac <- as.numeric( sub(",", ".", fac) )
#[1] 0.0 0.0 1.5 0.0 0.0 8.0

More generally converting factors to their underlying values rather than the factor representation:

fac <- as.factor( fac )
as.numeric(fac)
#[1] 1 1 2 1 1 3
as.numeric(as.character(fac))
#[1] 0.0 0.0 1.5 0.0 0.0 8.0

However, this is the canonical way of transforming to original values

 as.numeric(levels(fac))[fac]

From the help page ?as.factor

In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).

Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • Hi @SimonO101 would `gsub` be an alternative formulation in case the OP had more than one data point with commas in their data frame? – Tahnoon Pasha May 02 '13 at 10:31
  • @TahnoonPasha try it on `fac<- c( "0" , "0" , "1,5" , "0,6" , "0" , "8" )`. Here `sub` works on each element of the vector, so s long as each number has only one comma then it's ok. I am assuming that they only have one comma in each number of their data.frame because if they have more than one then using `gsub` probably won't help because you end up with a number with two decimal points in it!!! So it get's converted to NA anyway. :-) – Simon O'Hanlon May 02 '13 at 10:41
7

Replace comma's with dots, which represent decimals in R. Otherwise R thinks it is a character and coerces the value to NA.

Then, to extract values:

as.numeric(levels(df1[,2])[df[,2]])

(thanks @SimonO101 for the correction)

Maxim.K
  • 4,120
  • 1
  • 26
  • 43
0

Add the following line of code after you converted to character:

df[3,2] <- 8.5

You should then be able to convert characters to numerics. Since R's default decimal separator is . and not ,, your value is replaced by NA without that step.

fdetsch
  • 5,239
  • 3
  • 30
  • 58