0

I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:

The setup is something like this:

a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}

Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)

However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found

But I can return values from function(a[[1]]$DIM) all day long. It's there.

I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.

I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.

derp
  • 3
  • 2
  • Do you want to modify the `data.frame`s inside your list using `lapply`? – JdeMello Jan 15 '19 at 03:52
  • You can't do `get("dataframe$columname")` - you want to subset like `lapply(listy, function(x) x$DIM )` I think instead. – thelatemail Jan 15 '19 at 03:54
  • I don't want to modify the data frames. I want to run a function on the same column in each data frame and store the results in a list. – derp Jan 15 '19 at 03:59
  • `lapply` doesn't modify anything - that's the whole point. `lapply(listy, function(x) myfunction(x$DIM) )` will just return a list with the `myfunction` applied to the `DIM` column of each data.frame in your list. – thelatemail Jan 15 '19 at 04:01
  • You can feed `[[` as the function into `lapply` and then pass the name of the column as an additional parameter...i.e. something like this to extract `hp` from a list comprised of two copies of `mtcars`...`l <- list(mtcars, mtcars); lapply(l, "[[", "hp")` – Chase Jan 15 '19 at 04:04
  • thelatemail: obviously a better solution and it worked. thank you. – derp Jan 15 '19 at 04:08

1 Answers1

0

I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:

set.seed(1)

aList <- list(cars = mtcars, iris = iris)

for(i in seq_along(aList)){
  aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}

As opposed to...

invisible(
lapply(seq_along(aList), function(x){
  aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)

You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.

If you want to produce another set of data.frames using lapply then you do:

lapply(seq_along(aList), function(x){
  aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))

  return(aList[[x]])
})

Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:

# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0. 
JdeMello
  • 1,708
  • 15
  • 23
  • The `<<-` and `invisible` is not required to use `lapply` - `aList <- lapply(aList, function(x) {x$newcol <- runif(nrow(x)); x} )` will do it just fine. – thelatemail Jan 15 '19 at 04:16
  • It is required if you want to modify in place the data.frames. Maybe the example with lists is a poor instance. – JdeMello Jan 15 '19 at 04:18
  • 1
    But you can just overwrite the whole `aList` after modifying each part, as the code above does. It gives an identical result to the `for` loop. – thelatemail Jan 15 '19 at 04:31
  • Yes, that's true. Using lists to justify for loops as opposed to lapply is a poor example. It would make more sense if the object was a vector or a data.frame. – JdeMello Jan 15 '19 at 12:08