0

I am trying to keep the original data but calculate a lag only for certain rows of the text. Iv'e tried filtering but it leaves out the data. Also, the lag doesn't seem to really consider the dt..

What do you think?

library (tidyverse)
df2<- nycflights13::weather
df2 <- df2 %>% mutate (dt= as.Date(time_hour)) %>% add_count(origin,dt)
df2 <- df2 %>% mutate(lag_ewr1=lag(n,1))
Stat.Enthus
  • 335
  • 1
  • 12
  • lag has an additional argument called "order_by" you can use to make sure its ordering by the correct variable. Its unclear to me from your question which rows you want to lag and which ones you don't. – John Aug 20 '20 at 07:41
  • Do you want to create a lag column for each date? `df2 %>% group_by(dt) %>% mutate(lag_ewr1=lag(n,1))` ? – Ronak Shah Aug 20 '20 at 07:42
  • Thanks. I want to group by dt but only get the lag for EWR airport, other aiports should be blank. Something like this but keeping all the rows `df2 <- df2 %>% group_by(dt) %>% filter(origin=="EWR") %>% mutate(lag_ewr1=lag(n,1))` – Stat.Enthus Aug 20 '20 at 07:56
  • @RonakShah please see that in your solution, the lag is not accurate, it only refers to the first row in each dt, e.g all the rows on 1/1/2013 except the first get 17, when they should be NA – Stat.Enthus Aug 20 '20 at 08:01
  • The "EWR" airport condition is not mentioned in your original post so it was not possible for us to know that you want to `lag` only that airport. This should however be handled in my answer below. – Ronak Shah Aug 20 '20 at 08:07

1 Answers1

2

You can do lag for each dt and change lag_ewr1 to NA for all other airports except "EWR".

library(dplyr)

df2 %>% 
  group_by(dt) %>% 
  mutate(lag_ewr1=lag(n,1)) %>%
  ungroup %>%
  mutate(lag_ewr1 = replace(lag_ewr1, origin != 'EWR', NA)) 

If we want to lag between the date values, we can use :

df2 %>% 
  arrange(dt, origin) %>%
  group_by(dt) %>% 
  summarise(lag_ewr1 = first(n)) %>%
  mutate(lag_ewr1 = lag(lag_ewr1)) %>%
  left_join(df2, by = 'dt') %>%
  mutate(lag_ewr1 = replace(lag_ewr1, origin != 'EWR', NA))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks as usual! – Stat.Enthus Aug 20 '20 at 08:16
  • Quick question about this solution. We can see that the lag_ewr1 for 1/1/2013 should be NA for all these rows, and 17 for all rows of January 2nd of 2013. A quick look shows it's not like that.. `df3 <- df2 %>% filter(origin=="EWR") df3 <- df3 %>% dplyr::select(dt,lag_ewr1) df3 %>% filter(dt == "2013-01-01" | dt == "2013-01-02")` – Stat.Enthus Aug 20 '20 at 08:24
  • 1
    That is not what the original solution was doing. The `lag` was happening in the same `dt`. I think the updated answer is what you actually were looking for. – Ronak Shah Aug 20 '20 at 08:36
  • Just a tiny question, How can this solution be used for lag2, lag3 etc.. What can we use instead of first function? – Stat.Enthus Aug 20 '20 at 09:04
  • In this case it does not matter which value you select because all the values are same. I chose `first`, you can chose 2nd, 3rd. If you want `lag2` you need to change the `lag` line to `mutate(lag_ewr1 = lag(lag_ewr1, 2))` – Ronak Shah Aug 20 '20 at 09:12