2

I have a data.table with some character variables and numeric/integer variables, I would like to identify a variable that is of type character, and then run the tolower function to modify case. Here is what I am doing but it does not seem to accomplish the check of type to only operate on character variables:

set.seed(426)
dt <- data.table(a = runif(5), b = sample(LETTERS, 5))

dt
           a b
1: 0.8472276 Y
2: 0.1567767 J
3: 0.9817384 L
4: 0.2250681 S
5: 0.5994389 H

sapply(dt, class)
        a           b 
"numeric" "character"

dt2 <- as.data.table(sapply(dt, function(n){
    if(class(n) == "character"){
            n <- tolower(n)
    } else{
            n 
    }
}))

dt2
                   a b
1: 0.847227579215541 y
2: 0.156776716466993 j
3: 0.981738423462957 l
4: 0.225068145431578 s
5: 0.599438918055966 h

sapply(dt2, class)
          a           b 
"character" "character" 

I'm new to the apply family, any insight is appreciated

daRknight
  • 253
  • 3
  • 17
  • `?sapply` tries to return a matrix, and you cant have more thn one type in a matrix, hence converted to character. So try with `lapply` – user20650 Mar 30 '17 at 21:14
  • 1
    * i think* you could do `dt[, lapply(.SD, function(x) if(is.character(x)) tolower(x) else x)]` , although most likely a better data.table way to do this – user20650 Mar 30 '17 at 21:17
  • @GabrielFGeislerMesevage I am needing to convert several data tables with multiple columns, the objective with my approach is to avoid having to brute force commands directly by variable/column name over the whole dataset – daRknight Mar 30 '17 at 21:26
  • @user20650 your first comment was spot on .. simply replacing `sapply` to `lapply` functions precisely -- if you want to add that as an answer I will mark it as so – daRknight Mar 30 '17 at 21:27
  • @user20650 `cols = names(dt)[sapply(dt, is.character)]; dt[, (cols) := lapply(.SD, tolower), .SDcols=cols]` I guess. – Frank Mar 30 '17 at 22:50
  • cheers @Frank ; that's pretty similar how id do it on a dataframe , so in my comfort zone – user20650 Mar 30 '17 at 22:57
  • An additional option I just stumbled across is that you could also add new names for the columns. Don't know if this might become necessary. http://stackoverflow.com/a/43112154/5795592 – hannes101 Mar 31 '17 at 11:51

1 Answers1

2

All your variables are coerced to character by sapply as it tries to return a matrix. As you cant have more than one variable type in a matrix, all are converted to character. To avoid this you can use lapply.

For a more data.table way to approach this, courtesy of Frank, you could do

# Find character columns
cols = names(dt)[sapply(dt, is.character)] # or which(sapply(dt, is.character)) 
# set these columns to lower
dt[, (cols) := lapply(.SD, tolower), .SDcols=cols]

or this

dt[, lapply(.SD, function(x) if(is.character(x)) tolower(x) else x)]
user20650
  • 24,654
  • 5
  • 56
  • 91