0

I have a function that works when applied to the dataframes in the global environment, but I'm trying to get it to apply to a list. This question refers back to my previous question here. The function extracts information from the dataframe names in the global environment and makes a new column based on that info, but I would like it to apply to a list of dataframes rather than the dataframes in the global environment. Here's some mock data and the function:

pend4P_17k <- data.frame(x = c(1, 2, 3, 4, 5),
                  var1 = c('a', 'b', 'c', 'd', 'e'),
                  var2 = c(1, 1, 0, 0, 1))
pend5P_17k <- data.frame(x = c(1, 2, 3, 4, 5),
                  var1 = c('a', 'b', 'c', 'd', 'e'),
                  var2 = c(1, 1, 0, 0, 1))
pend10P_17k <- data.frame(x = c(1, 2, 3, 4, 5),
                  var1 = c('a', 'b', 'c', 'd', 'e'),
                  var2 = c(1, 1, 0, 0, 1))

list_pend <- list(pend4P_17k=pend4P_17k, pend5P_17k=pend5P_17k, pend10P_17k=pend10P_17k)

add_name_cols <- function(df){
  my_global <- ls(envir = globalenv())
  for(i in my_global)
    if(class(get(i)) == "data.frame" & grepl("pend", i))
    {
      df <- get(i)
      df$Pendant_ID <- gsub("^pend(.{2,3})_.*$", "\\1", i)
      assign(i, df, envir = globalenv())
    }
  return(df)
}


list_pend <- lapply(list_pend, add_name_cols)

It applies the function to the list, but every dataframe has the same Pendant_ID column, when it should match the ID given in the dataframe name (i.e. the pend4P_17k dataframe should have a Pendant_ID column that is "4P")

Using R version 3.5.1, Mac OS X 10.13.6

millie0725
  • 359
  • 2
  • 12

2 Answers2

2

A few things:

  1. In an if statement, use &&, not &. (Rationale: & suggests "0 or more" whereas if requires length of exactly 1; & doesn't short-circuit logic, might be nice to have.)

  2. Don't use == when looking at an object's class, many objects return a vector of length 2 or more with class. It's often better to use inherits (or one of the is.* functions, such as is.data.frame).

  3. lapply doesn't pass the name of an object, just its value. We'll use Map instead.

add_name_cols <- function(df, nm) {
  if (inherits(df, "data.frame") && grepl("pend", nm)) {
    df$Pendant_ID <- gsub("^pend(.{2,3})_.*$", "\\1", nm)
  }
  df
}
Map(add_name_cols, list_pend, names(list_pend))
# $pend4P_17k
#   x var1 var2 Pendant_ID
# 1 1    a    1         4P
# 2 2    b    1         4P
# 3 3    c    0         4P
# 4 4    d    0         4P
# 5 5    e    1         4P
# $pend5P_17k
#   x var1 var2 Pendant_ID
# 1 1    a    1         5P
# 2 2    b    1         5P
# 3 3    c    0         5P
# 4 4    d    0         5P
# 5 5    e    1         5P
# $pend10P_17k
#   x var1 var2 Pendant_ID
# 1 1    a    1        10P
# 2 2    b    1        10P
# 3 3    c    0        10P
# 4 4    d    0        10P
# 5 5    e    1        10P

If you have purrr installed (part of the tidyverse), you can also use

purrr::imap(list_pend, add_name_cols)
r2evans
  • 141,215
  • 6
  • 77
  • 149
2

You can modify your function so that it runs on a list as opposed to an environment:

list_pend <- list(pend4P_17k=pend4P_17k, pend5P_17k=pend5P_17k, pend10P_17k=pend10P_17k)

add_name_cols <- function(l){
  for(i in seq_along(l)){
    l[[i]]$Pendant_ID <- gsub("^pend(.{2,3})_.*$", "\\1", names(l)[i])
  }
  return(l)
}

list_pend <- add_name_cols(list_pend)

Output

> add_name_cols(list_pend)
$pend4P_17k
  x var1 var2 Pendant_ID
1 1    a    1         4P
2 2    b    1         4P
3 3    c    0         4P
4 4    d    0         4P
5 5    e    1         4P

$pend5P_17k
  x var1 var2 Pendant_ID
1 1    a    1         5P
2 2    b    1         5P
3 3    c    0         5P
4 4    d    0         5P
5 5    e    1         5P

$pend10P_17k
  x var1 var2 Pendant_ID
1 1    a    1        10P
2 2    b    1        10P
3 3    c    0        10P
4 4    d    0        10P
5 5    e    1        10P
slava-kohut
  • 4,203
  • 1
  • 7
  • 24