I'm quite new to both R and data wrangling so I apologize if this is a silly question. I just haven't been able to figure it out.
So, I have a large data frame with many observations for several individuals at different times. Here's a very simple example:
x <- c(11,10,12,10,10,8,13,11,14)
y <- 0
df <- data.frame(id,x,y)
> df
id x y
1 1 11 0
2 1 10 0
3 1 12 0
4 2 10 0
5 2 10 0
6 2 8 0
7 3 13 0
8 3 11 0
9 3 14 0
The y are the highest observed value of x up to that point, so I need to calculate them based on the values of x. I would like to have a data frame like this
> df
id x y
1 1 11 11
2 1 10 11
3 1 12 12
----------
4 2 10 10
5 2 10 10
6 2 8 10
----------
7 3 13 13
8 3 11 13
9 3 14 14
I tried using mutate()
from dplyr
but that did not work out very well: basically the values of x were repeated for y except the first one which was NA.
df %>%
+ group_by(id) %>%
+ mutate(y = if_else(x>= lag(y)
+ , x
+ , lag(y))
+ )
# A tibble: 9 x 3
# Groups: id [3]
id x y
<dbl> <dbl> <dbl>
1 1 11 NA
2 1 10 10
3 1 12 12
4 2 10 NA
5 2 10 10
6 2 8 8
7 3 13 NA
8 3 11 11
9 3 14 14
Then I tried to create a for-loop for it, but there the problem was that I needed to change the id's manually each time.
df1 <- df[df$id==1,]
for (i in 2:length(df1$x))
{ df1$y[1]<- df1$x[1]
if (df1$x[i] >= df1$y[i-1])
{
df1$y[i] <- df1$x[i]
} else
{
df1$y[i] <- df1$y[i-1]
}
}
So my question is can I either somehow calculate the values for y by using mutate()
or alternatively, is there a way to avoid changing the ids manually each time when using for-loop? Any advice is greatly appreciated!