I have imported various datasets with the same variables for different years. I am trying to transform some of the columns from factor to numeric. To save time I have created a function, which seems not to work.
I have created a list with the names of the datasets as strings
dfs <- list("df1", "df2", "df3", "df4", "df5", "df6", "df7", "df8")
And a second list with the names of the variables (columns) also as strings
vars <- list("var1", "var2", "var3", "var4")
First I tried joining both lists with an "$" in the middle and then passing the function to transform factors to numerics:
to_int <- function(column){
if (is.factor(column)){
column <-levels(column)[column]
column<-as.numeric(column)
return(column)
}
else{
return(column)
}
}
Option 1: create a vector with strings joined by $
col_names <- vector(mode = "list", length = length(dfs))
# Add the combination of names to each vector
for (df in dfs) {
for (var in vars){
r <- paste(df, var, sep = "$") # Combine the names in the 2 lists with a $ in the middle
col_names[[match(df, dfs)]][match(var, vars)] <- r # Assign result to the pre-set vector
}
}
# Iterate through list (col_names) and apply "to_int" to each of the strings in the list
for (l in col_names){
for (col_name in l){
colnm <- eval(parse(text = col_name))
nmrc <- to_int(colnm) # from factor to numeric each column. Works!
assign(col_name, nmrc, envir = globalenv()) # Creates values (Rstudio) with the correct name but columns on dfs remain intact
}
}
Then I tried treating the strings on both lists separately and get them together inside the loop:
Option 2: Treat the lists as separate strings and join in loop
for (df in dfs) {
for (var in vars){
a <- eval(parse(text = df))
b <- to_int(a[var]) # using $ returns null. using [] no change in original df, still factor
a[var] <- b
}
}
I finally tried creating a new function that has to variables as inputs:
# with two inputs
to_int2 <- function(df, col){
eval(parse(text = df))
if (is.factor(df[col])){ # $ OPERATOR IS INVALID FOR ATOMIC VECTORS
df[col] <-levels(df[col])[df[col]]
df[col]<-as.numeric(df[col])
return(df[col])
}
else{
return(df[col])
}
}
And passed that through a third attempt
Option 3: transform factor to numeric with two inputs
for (df in dfs) {
for (var in vars){
a <- to_int2(df, var) # $ OPERATOR IS INVALID FOR ATOMIC VECTORS
b <- eval(parse(text = df))
b$var <- a # No effect
}
}
None of them had an effect on the desired columns of the dataframes. Any idea on how to solve this? Thanks