I was trying to convert tolower the features of my dataframe that are of type character and found out this post:
tolower
I build up a function to do so on several data.frames and finally discovered that all my features were treated as characters!
mytolower <- function(p_vector){
if (is.character(p_vector)) return(tolower(iconv(enc2utf8(p_vector), sub = "byte")))
else return(p_vector)
}
for (df in c("train", "test")) as.data.frame(apply(get(df), 2, function(x) mytolower(x)), stringsAsFactors = FALSE)
Looking better on Stackoverflow, I found out this 2nd post that partially solved the issue by using lapply, but which curiously suggest that apply and sapply work in a similar way
lapply rather than apply
Thus, I finally build up this example that basically illustrate my trouble:
train <- data.frame(v1=1:3, v2=c("a","b","c"), v3=11:13, stringsAsFactors = FALSE)
str(train)
apply(train, 2, function(x) is.character(x)) #wrong
lapply(train, function(x) is.character(x)) #right
sapply(train, function(x) is.character(x)) #right
sapply(train, is.character) #right
While apply will consider all features as being "character", lapply or sapply will be able to distinguish numerical and character features. Why is it so ? Is there a way to make apply find the right answer ? Thanks