0

I have imported various datasets with the same variables for different years. I am trying to transform some of the columns from factor to numeric. To save time I have created a function, which seems not to work.

I have created a list with the names of the datasets as strings

dfs <- list("df1", "df2", "df3", "df4", "df5", "df6", "df7", "df8")

And a second list with the names of the variables (columns) also as strings

vars <- list("var1", "var2", "var3", "var4")

First I tried joining both lists with an "$" in the middle and then passing the function to transform factors to numerics:

to_int <- function(column){
  if (is.factor(column)){
    column <-levels(column)[column]
    column<-as.numeric(column)
    return(column)
  }
  else{
    return(column)
  }
}
Option 1: create a vector with strings joined by $
col_names <- vector(mode = "list", length = length(dfs))

# Add the combination of names to each vector
for (df in dfs) {
  for (var in vars){
    r <- paste(df, var, sep = "$") # Combine the names in the 2 lists with a $ in the middle
    col_names[[match(df, dfs)]][match(var, vars)] <- r # Assign result to the pre-set vector
  }
}

# Iterate through list (col_names) and apply "to_int" to each of the strings in the list
for (l in col_names){
  for (col_name in l){
    colnm <- eval(parse(text = col_name))
    nmrc <- to_int(colnm) # from factor to numeric each column. Works!
    assign(col_name, nmrc, envir = globalenv()) # Creates values (Rstudio) with the correct name but columns on dfs remain intact
  }
}

Then I tried treating the strings on both lists separately and get them together inside the loop:

Option 2: Treat the lists as separate strings and join in loop
for (df in dfs) {
  for (var in vars){
    a <- eval(parse(text = df))
    b <- to_int(a[var])  # using $ returns null. using [] no change in original df, still factor
    a[var] <- b
  }
}

I finally tried creating a new function that has to variables as inputs:

# with two inputs
to_int2 <- function(df, col){ 
  eval(parse(text = df))
  if (is.factor(df[col])){ # $ OPERATOR IS INVALID FOR ATOMIC VECTORS
    df[col] <-levels(df[col])[df[col]]
    df[col]<-as.numeric(df[col])
    return(df[col])
  }
  else{
    return(df[col])
  }
}

And passed that through a third attempt

Option 3: transform factor to numeric with two inputs
for (df in dfs) {
  for (var in vars){
    a <- to_int2(df, var) # $ OPERATOR IS INVALID FOR ATOMIC VECTORS
    b <- eval(parse(text = df))
    b$var <- a # No effect
  }
}

None of them had an effect on the desired columns of the dataframes. Any idea on how to solve this? Thanks

1 Answers1

0

It's generally better to work with multiple similar datasets as a list of frames. The premise being that whatever you do to one, you will do to all, and that is automated easily using lapply.

As an example, try this:

LOF <- mget(dfs)
LOF <- lapply(LOF, function(df) {
  df[vars] <- lapply(df[vars], as.integer)
  df
})

But if you must keep them separate, then try this:

for (nm in dfs) {
  dat <- get(nm)
  dat[vars] <- lapply(dat[vars], as.integer)
  assign(nm, dat)
}
r2evans
  • 141,215
  • 6
  • 77
  • 149