0

I am new to "R", still learning basics..

In one situation, I got some population data from a website, in xls format. When I tried to read that (using read.xls from gdata package), data came in R (a data frame). However, everything is character, which is fine so far.

After some cleansing of unnecessary rows and columns etc, I am trying to convert the numbers (present as characters) into numeric values wherein I am facing strange behaviour...

My data elements look like this (some sample here)

> class(males1)
[1] "factor"

> males1[1]
[1] 6,665,561

males1 is supposed to contain n rows, with one element each, the number of males per state. When I am applying as.numeric on the values, its actually giving me back a sum of digits

> as.numeric(males1[1])
[1] 35

When I convert that males1 into a vector, I get back a different error

> vv=as.vector(males1)
> vv[1]
[1] "6,665,561"
> as.numeric(vv[1])
[1] NA
Warning message:
NAs introduced by coercion 

I am sure, I am missing on something really basic..

help please...

Raghav
  • 2,128
  • 5
  • 27
  • 46
  • 3
    What number does `","` represent? Answer: it doesn't represent a number, so you need to remove it before you can convert the factor/character to a number: `as.numeric(gsub(",","",levels(males1)))[males1]` – Joshua Ulrich Feb 07 '13 at 19:48
  • 1
    I think it's just your unluck that `35 == 6+6+6+5+5+6+1` , making you think you're getting a sum of digits. Are you seeing a sum of digits for other cases as well? – Ed Staub Feb 07 '13 at 19:58

1 Answers1

1

I assume your reading in a csv that has commas instead of decimals or that seperates big numbers?

because of the commas:

> males1[1]
[1] 6,665,561 # is this meant to be 6665561 ?

is a factor. When you do as.numeric to a factor you will get a number but it is just the number of the factor in the order of the levels.

   x <- c("a","b","c")
   x <- as.factor(x)
   as.numeric(x)
   #[1] 1 2 3

Is it possible you want remove them with ?gsub or use dec="." in your read.csv?

user1317221_G
  • 15,087
  • 3
  • 52
  • 78
  • I am reading a xls file. The commas appear like the million separator. This example value is actually 6665561 as a numeric. Whats the right way to convert this number as a proper numeric value ? – Raghav Feb 07 '13 at 19:46
  • Cue up my favorite http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f `as.numeric(as.character(some_factor))` It's one of the nastier stumbling blocks in `R` . – Carl Witthoft Feb 07 '13 at 19:47
  • 2
    @CarlWitthoft: that's not the only problem. You also have to remove the commas. – Joshua Ulrich Feb 07 '13 at 19:49
  • @JoshuaUlrich Sorry, you're absolutely correct. You gotta do something like `scan(stuff, dec=',')` – Carl Witthoft Feb 07 '13 at 20:36