2

There is a nice discussion of how to convert character into numerics in this SO here. Maybe I missed something in that post, but what would one do if one does not know which columns are "convertable" (if any) ? Is it possible to check for convertability ? In addition, I usually suppress factor conversion (like character better) - so characters should be characters (not factors).

df <- data.frame(a=as.character(c(NA, 1/3)), b=letters[1:2], c=c('1|2', '4|2'), d=as.character(3:4), stringsAsFactors = F)

Then apply ... some function f ... to get:

str(f(df))
'data.frame':   2 obs. of  4 variables:
 $ a: num  NA 0.333
 $ b: chr  "a" "b"
 $ c: chr  "1|2" "4|2"
 $ d: int  3 4

How to achieve this for any data.frame not known beforehand ?

user3375672
  • 3,728
  • 9
  • 41
  • 70

1 Answers1

2

You could do something like this (not very elegant though).

fun1 <- function(i) {
  if (!all(is.na(as.numeric(df[, i])))){
    as.numeric(df[, i])
  } else {
    df[, i]
  }
}

df1 <- "names<-"(cbind.data.frame(lapply(seq_along(df), fun1),
                                  stringsAsFactors=FALSE), names(df))

> str(df1)
'data.frame':   2 obs. of  4 variables:
 $ a: num  NA 0.333
 $ b: chr  "a" "b"
 $ c: chr  "1|2" "4|2"
 $ d: num  3 4

Or more generally:

convertiblesToNumeric <- function(x){
  x2 <- cbind.data.frame(lapply(seq_along(x), function(i) {
    if (!all(is.na(as.numeric(x[, i])))){
      as.numeric(x[, i])
      } else {
        x[, i]
        }
    }), stringsAsFactors=FALSE)
  names(x2) <- names(x)
  return(x2)
}

df1 <- convertiblesToNumeric(df)
> str(df1)
'data.frame':   2 obs. of  4 variables:
 $ a: num  NA 0.333
 $ b: chr  "a" "b"
 $ c: chr  "1|2" "4|2"
 $ d: num  3 4
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • But works. What is happening with the "names<-" statement ? – user3375672 Jun 15 '18 at 11:34
  • You could also do `df1 <- cbind.data.frame(.); names(df1) <- names(df)` in two lines of code. – jay.sf Jun 15 '18 at 11:37
  • 1
    Try `?"names<-"` and see [here](http://adv-r.had.co.nz/Functions.html#special-calls) for some explanation. – jay.sf Jun 15 '18 at 11:45
  • 1
    Perhaps your function - for completeness - should allow a second parameter for `df` when ones data.frame is named something different (than `df`). As in `fun1 <- function(i, df) {...}` and then call `lapply(seq_along(DF), fun1, df=DF)` – user3375672 Jun 15 '18 at 11:55
  • Cool. And even put `as.character( x[, i])` inside `as.numeric( ... )` both places in case you get a factor (otherwise you convert the levels). – user3375672 Jun 15 '18 at 12:18
  • 1
    That would not work as expected. Better is to transform all to character with `df[] <- sapply(df, as.character)` before applying the function. Or include a line `x <- cbind.data.frame(sapply(x, as.character), stringsAsFactors=FALSE)` into the function to do that within. – jay.sf Jun 15 '18 at 12:42