1

Below is a sample data set

 area    periodyear    period  employment     date
 01         2020         08       100         2020-08-01
 01         2020         09       105         2020-09-01
 01         2020         10       110         2020-10-01
 02         2020         08       101         2020-08-01
 02         2020         09       102         2020-09-01
 02         2020         10       103         2020-10-01

The question is how I get R to return the last TWO rows. I created the date using the following code as a way of having a single value (instead of periodyear and period) that a max value can be found for.
substate$date<- ymd(paste(substate$PERIODYEAR,substate$PERIOD,"1",sep="-"))

I know how to have it find the max value of a column (date, in this instance) but unclear how to have it create a data frame that looks like below

 area    periodyear    period  employment     date
 01         2020         09       105         2020-09-01
 01         2020         10       110         2020-10-01
 02         2020         09       102         2020-09-01
 02         2020         10       103         2020-10-01

The reason for wanting the last TWO is that one month is brand new data and the one before is revised. From here, I update a SQL database.

Tim Wilcox
  • 1,275
  • 2
  • 19
  • 43
  • Possible duplicate https://stackoverflow.com/questions/53994497/how-to-select-last-n-observation-from-each-group-in-dplyr-dataframe I think this answers what you want to do. – Ronak Shah Dec 09 '20 at 04:42

2 Answers2

1

An option is slice after arrangeing the 'area', and the Date class converted 'date' (if they are not in the order)

library(dplyr)
df1 %>%
   arrange(area, as.Date(date)) %>%
   group_by(area) %>%
   slice_tail(n = 2) %>%
   ungroup

-output

# A tibble: 4 x 5
#  area  periodyear period employment date      
#  <chr>      <int>  <int>      <int> <chr>     
#1 01          2020      9        105 2020-09-01
#2 01          2020     10        110 2020-10-01
#3 02          2020      9        102 2020-09-01
#4 02          2020     10        103 2020-10-01

data

df1 <- structure(list(area = c("01", "01", "01", "02", "02", "02"), 
    periodyear = c(2020L, 2020L, 2020L, 2020L, 2020L, 2020L), 
    period = c(8L, 9L, 10L, 8L, 9L, 10L), employment = c(100L, 
    105L, 110L, 101L, 102L, 103L), date = c("2020-08-01", "2020-09-01", 
    "2020-10-01", "2020-08-01", "2020-09-01", "2020-10-01")), 
    row.names = c(NA, 
-6L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Maybe this:

library(dplyr)
#Code
df %>% arrange(area,date) %>% group_by(area)  %>%filter(row_number() %in% 2:n())

Output:

# A tibble: 4 x 5
# Groups:   area [2]
   area periodyear period employment date      
  <int>      <int>  <int>      <int> <date>    
1     1       2020      9        105 2020-09-01
2     1       2020     10        110 2020-10-01
3     2       2020      9        102 2020-09-01
4     2       2020     10        103 2020-10-01
Duck
  • 39,058
  • 13
  • 42
  • 84
  • 1
    @TimWilcox Fantastic, as you accepted one answer if you liked this you could upvote +1 :) – Duck Dec 08 '20 at 23:12