1

I have a dataframe of which the columns contain a variable amount of numbers and a variable amount of NA's. The dataframe looks like this:

    V1 V2 V3 V4 V5 V6
1    0 11  4  0  0 10
2    0 17  3  0  2  2
3   NA  0  4  0  1  9
4   NA 12 NA  1  1  0
<snip>
743 NA NA NA NA  8 NA
744 NA NA NA NA  0 NA

I want to make a boxplot out of this, but when I do

boxplot(dataframe)

I get the error

adding class "factor" to an invalid object

When I do

lapply(dataframe,class)

I get the folowing output:

$V1
[1] "factor"
$V2
[1] "factor"
<snip>
$V6
[1] "factor"

So how can I change my dataframe so that the columns are seen as numeric?

tonytonov
  • 25,060
  • 16
  • 82
  • 98
Niek de Klein
  • 8,524
  • 20
  • 72
  • 143

3 Answers3

5

You want to apply as.numeric(as.character(...)) to each factor column. The code below shows how this can be done affecting only the factor variables leaving the numeric types alone.

## dummy data
df <- data.frame(V1 = factor(sample(1:5, 10, rep = TRUE)),
                 V2 = factor(sample(99:101, 10, rep = TRUE)),
                 V3 = factor(sample(1:2, 10, rep = TRUE)),
                 V4 = 1:10)

df2 <- data.frame(sapply(df, function(x) { if(is.factor(x)) {
                                              as.numeric(as.character(x))
                                           } else {
                                              x
                                           }
                                         }))

This gives:

> df2
   V1  V2 V3 V4
1   4 101  2  1
2   1 100  1  2
3   5  99  2  3
4   4  99  2  4
5   2 100  1  5
6   2 100  2  6
7   2 101  2  7
8   4 100  1  8
9   2 101  2  9
10  4 101  1 10
> str(df2)
'data.frame':   10 obs. of  4 variables:
 $ V1: num  4 1 5 4 2 2 2 4 2 4
 $ V2: num  101 100 99 99 100 100 101 100 101 101
 $ V3: num  2 1 2 2 1 2 2 1 2 1
 $ V4: num  1 2 3 4 5 6 7 8 9 10
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
2

How about

as.data.frame(lapply(dat1,function(x){as.numeric(as.character(x))}))

which simply converts each column to numeric (after first converting to character). You have to be careful with this because naive conversion of factors to numeric will generally result in the underlying integer codes, not the values you see displayed.

joran
  • 169,992
  • 32
  • 429
  • 468
1

with a test data.frame:

testframe <- data.frame(V1 = as.factor(c(0,0,NA,NA)), V2 = as.factor(c(11,17,0,12)))

> sapply(testframe, class)
      V1       V2 
"factor" "factor" 

You could use

testframe.n <- as.data.frame(sapply(testframe, as.numeric))

> sapply(testframe.n, class)
       V1        V2 
"numeric" "numeric" 

Now, all columns should be numeric and boxplot can be called.

user625626
  • 1,102
  • 2
  • 10
  • 16
  • Just be careful with applying only `as.numeric()` to a factor - that gives you the internal numeric ID representation not the information stored in the level labels. You can construct examples where your code will fail, see e.g. the example at the bottom of this Answer: http://stackoverflow.com/a/9481292/429846 – Gavin Simpson Feb 28 '12 at 18:20