3

Problem

Working with a data frame in R, I want to change variables represented as characters into variables represented as numbers (i.e. from class chr to num).

For an entire data set, this is a straightforward problem (different flavors of solutions here, here, here, and here). However, I have one variable that needs to stay as characters.

Example Data

Using this example data (df), let's say I want to change only var1 from class chr to num, leaving "chrOK" as a chr variable. In my real data set, there are many variables to change, so manual approaches like df$var1 = as.numeric(df$var1) is too laborious.

df = data.frame(var1  = c("1","2","3","4"), 
                var2  = c(1,2,3,4),
                chrOK = c("rick", "summer","beth", "morty"),
                stringsAsFactors = FALSE)

str(df)

'data.frame':   4 obs. of  3 variables:
$ var1 : chr  "1" "2" "3" "4"
$ var2 : num  1 2 3 4
$ chrOK: chr  "rick" "summer" "beth" "morty"

Partial Solutions

I've tried a several approaches that seem close, but don't do exactly what I want.

Attempt 1 — introduces NAs

Most of my columns are characters that should be numeric, like "var1". So, using apply() to convert class works. However, this approach fails induces NA values in "chrOK".

df = as.data.frame(apply(df, 2, function(x) as.numeric(x))) 

Warning message:
In FUN(newX[, i], ...) : NAs introduced by coercion

str(df)
'data.frame':   4 obs. of  3 variables:
$ var1 : num  1 2 3 4
$ var2 : num  1 2 3 4
$ chrOK: num  NA NA NA NA

Attempt 2 — split, convert, cbind

Using apply() on the subset of chr variables, excluding "chrOK", doesn't induce NAs, but requires using cbind() to re-include "chrOK".

This solution is not ideal because cbind() results are hard to check for data mutations. (Also, "chrOK" is returned as a factor. Using df = cbind(changed,as.character(unchanged)) doesn't work. [a])

changed = as.data.frame(apply(df[-(which(colnames(df)=="chrOK"))],2,function(x) as.numeric(x)))
unchanged = (df$chrOK)

df = cbind(changed,unchanged)

str(df)
'data.frame':   4 obs. of  3 variables:
$ var1     : num  1 2 3 4
$ var2     : num  1 2 3 4
$ unchanged: Factor w/ 4 levels "beth","morty",..: 3 4 1 2 #[a]

Attempt 3 — correct subset, but error when converting

Using setdiff() I get the subset of chr class variables excluding `"chrOK".

df[setdiff(names(df[sapply(df,is.character)]),"chrOK")]
  var1
1    1
2    2
3    3
4    4

But trying to plug this into an apply function, so that only the subset is changed from chr to num returns an error (see [b]).

 apply(as.numeric(df[setdiff(names(df[sapply(df,is.character)]),"chrOK")]),
       2,function(x) as.numeric(x))

Error in apply(as.numeric(df[setdiff(names(df[sapply(df, is.character)]),  :
(list) object cannot be coerced to type 'double' #[b]

Questions

  • What is the best solution for converting a data frame's character variables to numeric, while excluding a specified subset?
  • Which of my attempts is the right path or is there a better approach?
  • [bonus] What mechanism causes the unexpected results at [a] and [b], above?
Danielle
  • 733
  • 1
  • 10
  • 24

1 Answers1

2

We can use type.convert from base R by looping over the columns of the dataset and assign it back to the original object

df[] <- lapply(df, function(x) type.convert(as.character(x), as.is = TRUE))
str(df)
#'data.frame':   4 obs. of  3 variables:
#$ var1 : int  1 2 3 4
#$ var2 : int  1 2 3 4
#$ chrOK: chr  "rick" "summer" "beth" "morty"

The type.convert is calling a C code i.e. C_typeconvert


The reason why the OP's solutions are getting NAs are

1) apply converts the data.frame to matrix and matrix can hold only a single class. Suppose there is a single character element in the matrix, it converts the whole into character.

2) Using as.numeric with apply is problematic as the 'chrOK' is already a character class column. Whenever as.numeric is applied to non-numeric strings, it converts it NA.

3) The OP used the same apply in the second method. It is described as in 1.

akrun
  • 874,273
  • 37
  • 540
  • 662
  • This works! How does this "know" to change var1 but not chrOK? – Danielle May 24 '17 at 06:47
  • @Danielle The `type.convert` is calling an internal C code and it would be checking whether the elements are numbers or character before converting to respective classes. – akrun May 24 '17 at 06:53
  • You explain that `as.numeric` will convert characters to NA, so why does using it with `apply` only change chrOK into NAs but not var1? – Danielle May 24 '17 at 07:12
  • @Danielle Because all others are numbers i.e. 0:9 while the elements in 'chrOK' is 'a-z' – akrun May 24 '17 at 10:11