0

I'm quite new to both R and data wrangling so I apologize if this is a silly question. I just haven't been able to figure it out.

So, I have a large data frame with many observations for several individuals at different times. Here's a very simple example:

x <- c(11,10,12,10,10,8,13,11,14)
y <- 0
df <- data.frame(id,x,y)
> df
  id  x y
1  1 11 0
2  1 10 0
3  1 12 0
4  2 10 0
5  2 10 0
6  2  8 0
7  3 13 0
8  3 11 0
9  3 14 0

The y are the highest observed value of x up to that point, so I need to calculate them based on the values of x. I would like to have a data frame like this

> df
  id  x y
1  1 11 11
2  1 10 11
3  1 12 12
----------
4  2 10 10
5  2 10 10
6  2  8 10
----------
7  3 13 13
8  3 11 13
9  3 14 14 

I tried using mutate() from dplyr but that did not work out very well: basically the values of x were repeated for y except the first one which was NA.

 df %>%
+   group_by(id) %>%
+   mutate(y = if_else(x>= lag(y)
+                        , x
+                        , lag(y))
+ )
# A tibble: 9 x 3
# Groups:   id [3]
     id     x     y
  <dbl> <dbl> <dbl>
1     1    11    NA
2     1    10    10
3     1    12    12
4     2    10    NA
5     2    10    10
6     2     8     8
7     3    13    NA
8     3    11    11
9     3    14    14

Then I tried to create a for-loop for it, but there the problem was that I needed to change the id's manually each time.

df1 <- df[df$id==1,]

for (i in 2:length(df1$x))
{ df1$y[1]<- df1$x[1]
if (df1$x[i] >= df1$y[i-1]) 
{
  df1$y[i] <- df1$x[i]
} else 
{
  df1$y[i] <- df1$y[i-1]
}
}

So my question is can I either somehow calculate the values for y by using mutate() or alternatively, is there a way to avoid changing the ids manually each time when using for-loop? Any advice is greatly appreciated!

LaHN
  • 47
  • 7

0 Answers0