0

I'm completely new to "R" (so this might seem a little basic) I've extracted some data from the World Health Organization's but am struggling to convert a row of data which has some of the data points classed as factors and some as numerics.

Firstly, I've isolated a row of data that represents private health expenses by year (between 2003 and 2014)

 > private_exp
   2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
 32 41.3 41.3 38.6 37.8 36.9   33 33.4 33.6 30.4 28.2 28.2 25.3

When I looked at the structure of the data I noticed that some of the data points are listed as numberics while others are listed as factors (which I found odd).

> str(private_exp)
'data.frame':   1 obs. of  12 variables:
 $ 2003: Factor w/ 23 levels "","0","0.2","14.1",..: 15
 $ 2004: Factor w/ 20 levels "","0.2","107",..: 15
 $ 2005: Factor w/ 21 levels "",">90","0.2",..: 15
 $ 2006: num 37.8
 $ 2007: Factor w/ 17 levels "","0.9","15",..: 9
 $ 2008: num 33
 $ 2009: num 33.4
 $ 2010: num 33.6
 $ 2011: Factor w/ 20 levels "","0.7","13.4",..: 12
 $ 2012: num 28.2
 $ 2013: num 28.2
 $ 2014: num 25.3

I don't really understand how one data point, say for 2003 can be a factor with 23 levels when really its just a number. Anyway - I tried changing it to a numeric and didn't really understand the output.

> as.numeric(private_exp$`2003`)
[1] 15

And it still seems to be a factor:

> private_exp$`2003`
[1] 41.3
23 Levels:  0 0.2 14.1 16.9 2 21.6 2617 2864 3.89 32.3 ... No data
> class(private_exp$`2003`)
[1] "factor"

This is my first atempt at doing anything with R - I'm clearly missing something. Any help would be greatly appreciated.

Greg Martin
  • 243
  • 3
  • 5
  • 17
  • Just guessing, but try to read your data with `stringsAsFactors = F` – eivicent Sep 03 '15 at 08:39
  • 1
    probably you have some non numeric value inside your variables because R is considering it as character and so, as you didn't specify you didn't want characters as factors, R is importing the variables as factors. You can do `as.numeric(as.character(myvariable))` to change your variable `myvariable` as numeric and the non-numeric values will be converted to NA. Or you can first search for the non-numeric values in your input file, correct them, and import again the file in R, where the variables should be imported directly into numeric – Cath Sep 03 '15 at 08:50
  • Or you can use `hablar::retype` and it will convert all your columns that are factors to numeric and character, depending of values. Saves alot of time. – davsjob Nov 04 '18 at 11:10

1 Answers1

2

It looks like you have been tripped-up by R's copy-semantics.

When you do:

as.numeric(private_exp$`2003`)

you are taking a copy of the 2003 column and turning the copy into a numeric vector.

If you want to change the original data.frame, you must assign it back:

private_exp$`2003` <- as.numeric(private_exp$`2003`)

Note that as.numeric on a factor vector will give the levels, not the textual values. To get the textual values as a numeric vector, you must extract them using as.character and then convert to a numeric:

private_exp$`2003` <- as.numeric(as.character(private_exp$`2003`))

You can find out more about factors in the documentation.

sdgfsdh
  • 33,689
  • 26
  • 132
  • 245