1

My example data frame is constructed as follows:

ID       <- c("1", "2", "1", "2", "1", "2")
Current  <- c(1.12, NA, 3.66, 8.95, 4.73, 7.82)
Previous <- c(NA, NA, NA, NA, NA, NA)
df       <- data.frame(ID, Current, Previous, stringsAsFactors = TRUE)

Adding an extra ID to levels(df$ID) by writing levels(df$ID) <- c(levels(df$ID), "3") results in:

> levels(df$ID)
[1] "1" "2" "3"

(I am adding an extra level because the data frame I am currently working with contains a lot of levels that do not even occur once.)

Writing df now prints:

  ID Current Previous
1  1    1.12       NA
2  2      NA       NA
3  1    3.66       NA
4  2    8.95       NA
5  1    4.73       NA
6  2    7.82       NA

Because typeof(df$Current) is "double" and typeof(df$Previous) is "logical", I would like to convert the Previous column to "double". In order to achieve this, I write:

df$Previous <- as.numeric(as.character(df$Previous))

Writing typeof(df$Previous) now results in "double".

I would now like to do the following: Take the first ID 1 entry of Current and copy it to the second ID 1 entry of Previous and so on if the Current value is not an NA. That is, the resulting table should look like this:

  ID Current Previous
1  1    1.12       NA
2  2      NA       NA
3  1    3.66     1.12
4  2    8.95       NA
5  1    4.73     3.66
6  2    7.82     8.95

I already tried doing this by writing the following for loop, but it did not work and I do not know what I did wrong:

i = 1
for(i in length(unique(df$ID))) {
  j = 1
  k = 1
  idLocationAbsolute = 1
  for(j in length(df$ID)) {
    if(unique(df$ID)[i] == df$ID[j]) {
      idLocationAbsolute[k] = j
      k = k + 1
    }
  }
  if(k > 1) {
    k = k - 1
    l = 1
    for(l in k) {
      if(!is.na(idLocationAbsolute[l])) {
        df$Previous[l + 1] = df$Current[l]
      }
    }
  }
}

Please note that I wrote idLocationAbsolute = 1 because R prints an error message otherwise. But I cannot tell whether this is a good thing to do or if it just breaks the whole code. I, however, suppose that this is not the cause of the code not doing what it should do.

Nemgathos
  • 605
  • 1
  • 5
  • 13
  • 4
    You need `lag` by group, With `dplyr`, you can do `df %>% group_by(ID) %>% mutate(Previous = lag(Current))` – Ronak Shah May 28 '19 at 12:19

0 Answers0