1

New to R, so probably a noob question. Consider the following code, in particular the for loop:

library(lubridate)
#Read in all site files in the directory
sitefiles <- list.files(pattern = "\\.csv$")   #Get a list of all csv's in dir
sites <- list()                                #Create an empty list.
sites <- lapply(sitefiles, read.csv)          
names(sites) <- gsub("\\.csv$", "", sitefiles)  #Rename the list

for (site in names(sites)){
  site$time <-  dmy_hms(site$timestamp)
      #Error: $ operator is invalid for atomic vectors
}

OK, let's try this instead:

for (site in sites){
  site$time <-  mdy_hms(site$timestamp)
}

It appears to do nothing to the data frames in the list sites. In particular the command colnames(sites[[1]]) is the same before and after running the for loop - no column has been added.

But, there was a change. Rstudio tells me there is a new variable, a data frame called site, which DOES have the column time added. What the heck???

What is going on here? How do I execute this command successfully?

Dirk
  • 3,073
  • 4
  • 31
  • 36
  • `site` is not an object yet. Furthermore, `sites` is a list. Hence, you have to `lapply` instead of loop. – loki Aug 17 '17 at 13:06
  • What would that lapply look like? Also, do you have some documentation on this please? I thought that for loops and lapply could be interchanged. – Dirk Aug 17 '17 at 13:21
  • 2
    You want `sites[[site]]$time <- dmy_hms(site$timestamp)` instead of `site$time <- dmy_hms(site$timestamp)` in your `for` loop. Check out [this post](https://stackoverflow.com/questions/36777567/is-there-a-logical-way-to-think-about-list-indexing/36815401#36815401) and the links it contains for a longer discussion. Also the help file `?"["` is worth reading 2 or 3 times. – lmo Aug 17 '17 at 13:32
  • @lmo, `sites[[site]]$time <- dmy_hms(site$timestamp)` generates the error: `Error in `*tmp*`[[site]] : invalid subscript type 'list'` – Dirk Aug 17 '17 at 14:03
  • 1
    You will have to provide the output of `str(sites)`. If you run the line prior, `names(sites) <- gsub("\\.csv$", "", sitefiles)` you get a named list. The `for` loop loops through these names and `sites[[site]]` will refer to the list element with the given name. That line should actually be `sites[[site]]$time <- dmy_hms(sites[[site]]$timestamp)`. I missed the second reference to `site$`. Perhaps you are trying to run this line outside of the `for` loop? That would cause the error you see if you have an object named site in your global environment that is a list as you mention in post. – lmo Aug 17 '17 at 14:09
  • Holy cow, that worked! If you post it as the answer, I'll accept it. Thanks for the help. Now to delve into the [[ notation and *why* this works. – Dirk Aug 17 '17 at 14:28
  • @Dirk, I posted an answer [on a similar toppic](https://stackoverflow.com/a/41139771/3250126). You should check out the link in there. It provides information on *why* Imo's solution works. – loki Aug 17 '17 at 14:44
  • 1
    @Dirk, I added an answer holding a `lapply` solution. Since your example does not provide a reproducible example, you might check if it works. Otherwise, I could refine it. – loki Aug 17 '17 at 14:56
  • 1
    @loki thanks. It works! – Dirk Aug 17 '17 at 21:23

2 Answers2

1

A solution with lapply would look like this:

sites <- lapply(sites, function(x) { 
  x$time <-  dmy_hms(x$timestamp)
  x
})

This summary helps you with the subsetting of all the different data types.

The basics to know for this case are:

  • sites is a list holding multiple data.frames
  • lapply takes all these data.frames and applies the same function
  • afterwards a list of these modified data.frames is returned

Little side note: it could be that you again have to name the list, if you later rely on the names...

loki
  • 9,816
  • 7
  • 56
  • 82
0

I could not find any source on this, but testing it locally, the for loop seems to be creating local copies of the items in the list you are iterating over. Perhaps that's the reason iterating over the names, or rather apply are recommended.

> a <- list(mtcars$cyl)
> b <- list(mtcars$mpg)
> x <- c(a, b)
> tracemem(a)
[1] "<0000000014731C68>"
> tracemem(b)
[1] "<00000000147711D8>"
> for(myList in x) { print(tracemem(myList)) }
[1] "<000000000C37E650>"
[1] "<00000000096AED50>"

The site variable remains because it is standard behavior that the index variable remains in the surrounding environment.

sebastianmm
  • 1,148
  • 1
  • 8
  • 26
  • That actually makes sense. What is the correct way to do this then? – Dirk Aug 17 '17 at 14:05
  • 1
    I think `lapply` would be the smartest way. Other than that, iterating over `names(sites)` and accessing the list using the `[[]]` operator, as @lmo suggested above. – sebastianmm Aug 17 '17 at 14:09