0

Hi all—I am trying to run a for loop that creates a new variable based on some conditions over a vector of multiple dataframes in R (that are split by observation ID), and am having some trouble.

Dat_nations <- split(Dat, Dat$newccode)

^This creates my vector of 143 dataframes, grouped by country code. The for loop that I want to apply to each country dataframe is:

for (i in 1:(length(df1$timeafter)
                    -1){
  df1$timeafter[i+1] <- (df1$newdate[i+1]-df1$newdate[i])  
}

Essentially, I am creating a new variable that counts the number of days an observation came after the preceding observation from within a specific country (they are arranged in order by date). But I can't figure out how to run this over all dataframes iteratively, modifying each, and then combining them all back together.

Thanks so much!

Andrew
  • 5
  • 2
  • Please trim your code to make it easier to find your problem. Follow these guidelines to create a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – Community Sep 11 '21 at 02:14

1 Answers1

2

In general, the canonical way to deal with a list of frames is using lapply, though a for loop can certainly work; see https://stackoverflow.com/a/24376207/3358227 for several discussions about "list of frames".

A note: length(df) is the number of columns, not the number of rows. (Further, 1:length(x) can be a mistake if used programmatically: if for some reason the argument x is length 0, then one would want/expect this to return a vector of length 0, but instead it returns 1:0 aka c(1, 0). To iterate more safely over columns, use seq_along(x); to iterate safely over rows, use seq_len(nrow(x)).

I think you can do what you need with:

lapply(Dat_nations, function(dat) {
  dat$timeafter <- c(NA, diff(dat$newdate))
  dat
})

Incidentally, if you are intending to then combine this back into a single frame (for any number of reasons), this can be done without split (and likely faster) with:

Dat$timeafter <- ave(Dat$newdate, Dat$newccode, FUN = function(z) c(NA, diff(z)))

Last note: diff(z) when z is of class Date or POSIXt will return a number of class difftime. This means that instead of showing for instance 3, on the console it will show Time difference of 3 days. While it looks like a string, it is still a number ... dput(diff(Sys.Date()+c(0,3)))+10 (adding 10 to the difference) still works. However, the units can change (esp if POSIXt) which is disrupting. One easy way around this is to force it with something like

diff(Sys.Date() + c(0, 3))
# Time difference of 3 days
as.numeric(diff(Sys.Date() + c(0, 3)), units = "days")
# [1] 3
as.numeric(diff(Sys.Date() + c(0, 3)), units = "hours")
# [1] 72
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Super helpful! One final Q: I am receiving the following error from R when I run the ave command: ``` Error in as.Date.default(value) : do not know how to convert 'value' to class “Date” ``` This of course regards the the 'date' classification of the variable, but it seemingly isn't able to move past that and actually conduct calculations. Oddly, it works fine when I use lapply. – Andrew Sep 10 '21 at 22:35
  • I don't know about that error, I'd need to "see" it in action, tbh. – r2evans Sep 10 '21 at 23:42