4

When I filter a dataset based on a lag() function, I lose the first row in each group (because those rows have no lag value). How can I avoid this so that I keep the first rows despite their not having any lag value?

ds <- 
  structure(list(mpg = c(21, 21, 21.4, 18.7, 14.3, 16.4), cyl = c(6, 
  6, 6, 8, 8, 8), hp = c(110, 110, 110, 175, 245, 180)), class = c("tbl_df", 
  "tbl", "data.frame"), row.names = c(NA, -6L), .Names = c("mpg", 
  "cyl", "hp"))

# example of filter based on lag that drops first rows
ds %>% 
  group_by(cyl) %>% 
  arrange(-mpg) %>% 
  filter(hp <= lag(hp))
Joe
  • 3,217
  • 3
  • 21
  • 37
  • 1
    `lag(hp,default = hp[1])`? – joran Apr 25 '18 at 18:46
  • @joran, manipulating the default argument in this way is a good solution, and can be updated to "filter(hp > lag(hp, default = hp[1] - 1))" in cases where equivalence is insufficient. – Joe Apr 25 '18 at 18:55

2 Answers2

4

Having filter(hp <= lag(hp)) excludes rows where lag(hp) is NA. You can instead filter for either that inequality or for lag(hp), as is the case for those top rows of each group.

I included prev = lag(hp) to make a standalone variable for the lags, just for clarity & debugging.

library(tidyverse)

ds %>%
    group_by(cyl) %>%
    arrange(-mpg) %>%
    mutate(prev = lag(hp)) %>%
    filter(hp <= prev | is.na(prev))

This yields:

# A tibble: 4 x 4
# Groups:   cyl [2]
    mpg   cyl    hp  prev
  <dbl> <dbl> <dbl> <dbl>
1  21.4    6.  110.   NA 
2  21.0    6.  110.  110.
3  21.0    6.  110.  110.
4  18.7    8.  175.   NA 
camille
  • 16,432
  • 18
  • 38
  • 60
3

Since OP intends to use <= (less than or equal to) with previous value, hence using lag with default = +Inf will be sufficient.

Also, there is no need to have separate arrange call in dplyr chain as lag provides option to select order_by.

Hence, solution can be written as:

ds %>% 
  group_by(cyl) %>% 
  filter(hp <= lag(hp, default = +Inf, order_by = -mpg))

#Below result is in origianl order of the data.frame though lag was calculated 
#in ordered value of mpg
# # A tibble: 4 x 3
# # Groups: cyl [2]
#     mpg   cyl    hp
#    <dbl> <dbl> <dbl>
# 1  21.0  6.00   110
# 2  21.0  6.00   110
# 3  21.4  6.00   110
# 4  18.7  8.00   175
MKR
  • 19,739
  • 4
  • 23
  • 33