The platform from which I'm importing data to R does not support specifying the data type, hence all my columns are character
. I have an Excel file that specifies which columns are factor
, including the relevant labels
and levels
. Now, I'm trying to write a function to dynamically change the data type of various columns of my data.frame
Thanks to the excellent answer to this question (dplyr - mutate: use dynamic variable names), I managed to write the following function, in which I dynamically set the column name to the mutate
function.
readFactorData <- function(filepath) {
t <- read.xlsx(filepath)
sapply(nrow(t), function(i) {
colname <- as.character(t[i, "Item"])
factorLevels <- t[i, 3:ncol(t)][which(!is.na(t[i, 3:ncol(t)]))]
totalLevels <- length(factorLevels)
listOfLabels <- as.character(unlist(factorLevels))
mutate(d, !!colname := factor(d[[colname]], labels=(1:totalLevels), levels=listOfLabels))
# requires dplyr v.0.7+
# the syntax `!!variablename:=` forces evaluation of the variablename before evaluating the rest of the function
})
}
It works, and each iteration returns the entire data frame, with the relevant column (colname
) changed to factor. But, each iteration overwrites the previous, so this function only returns the last result of i
. How do I make sure that I end up with 1 single data frame, in which all the relevant columns are saved?
Sample data (make sure to comment out the first line of the function above, since we're defining t
here):
d <- data.frame("id" = sample(100:999, 10), "age" = sample(18:80, 10), "factor1" = c(rep("a", 3), rep("b", 3), rep("c", 4)), "factor2" = c("x","y","y","y","y","x","x","x","x","y"), stringsAsFactors = FALSE)
t <- data.frame("Item" = c("factor1","factor2"), "Label" = c("This is factor 1", "This is factor 2"), "level1" = c("a","x"), "level2" = c("b","y"), "level3" = c("c","NA"))