How to have R create a data frame using the last two months

Question

Below is a sample data set

 area    periodyear    period  employment     date
 01         2020         08       100         2020-08-01
 01         2020         09       105         2020-09-01
 01         2020         10       110         2020-10-01
 02         2020         08       101         2020-08-01
 02         2020         09       102         2020-09-01
 02         2020         10       103         2020-10-01

The question is how I get R to return the last TWO rows. I created the date using the following code as a way of having a single value (instead of periodyear and period) that a max value can be found for.
substate$date<- ymd(paste(substate$PERIODYEAR,substate$PERIOD,"1",sep="-"))

I know how to have it find the max value of a column (date, in this instance) but unclear how to have it create a data frame that looks like below

 area    periodyear    period  employment     date
 01         2020         09       105         2020-09-01
 01         2020         10       110         2020-10-01
 02         2020         09       102         2020-09-01
 02         2020         10       103         2020-10-01

The reason for wanting the last TWO is that one month is brand new data and the one before is revised. From here, I update a SQL database.

Possible duplicate https://stackoverflow.com/questions/53994497/how-to-select-last-n-observation-from-each-group-in-dplyr-dataframe I think this answers what you want to do. — Ronak Shah, Dec 09 '20 at 04:42

akrun · Accepted Answer · 2020-12-08T22:58:30.950

An option is slice after arrangeing the 'area', and the Date class converted 'date' (if they are not in the order)

library(dplyr)
df1 %>%
   arrange(area, as.Date(date)) %>%
   group_by(area) %>%
   slice_tail(n = 2) %>%
   ungroup

-output

# A tibble: 4 x 5
#  area  periodyear period employment date      
#  <chr>      <int>  <int>      <int> <chr>     
#1 01          2020      9        105 2020-09-01
#2 01          2020     10        110 2020-10-01
#3 02          2020      9        102 2020-09-01
#4 02          2020     10        103 2020-10-01

data

df1 <- structure(list(area = c("01", "01", "01", "02", "02", "02"), 
    periodyear = c(2020L, 2020L, 2020L, 2020L, 2020L, 2020L), 
    period = c(8L, 9L, 10L, 8L, 9L, 10L), employment = c(100L, 
    105L, 110L, 101L, 102L, 103L), date = c("2020-08-01", "2020-09-01", 
    "2020-10-01", "2020-08-01", "2020-09-01", "2020-10-01")), 
    row.names = c(NA, 
-6L), class = "data.frame")

I think it's not entirely clear from the TO's message, but I think they are rather looking for the two max dates per area. — deschen, Dec 08 '20 at 22:58

score 1 · Answer 2 · answered Dec 08 '20 at 22:59

1

Maybe this:

library(dplyr)
#Code
df %>% arrange(area,date) %>% group_by(area)  %>%filter(row_number() %in% 2:n())

Output:

# A tibble: 4 x 5
# Groups:   area [2]
   area periodyear period employment date      
  <int>      <int>  <int>      <int> <date>    
1     1       2020      9        105 2020-09-01
2     1       2020     10        110 2020-10-01
3     2       2020      9        102 2020-09-01
4     2       2020     10        103 2020-10-01

answered Dec 08 '20 at 22:59

Duck

39,058
13
42
84

1

@TimWilcox Fantastic, as you accepted one answer if you liked this you could upvote +1 :) – Duck Dec 08 '20 at 23:12

How to have R create a data frame using the last two months

2 Answers2

data