0

I have a data frame simmilar to the one below and would like to compare the maximum value of year / month to the current year month. Problem is that for max(month) I get back 12 (obviously) and for the combined max year / month I get 2020_12. But what I need is the max value of the combined year / month (so 2020_03)

df <- data.frame(ID = c(1:6),
                 year = c(2019,2019,2019,2020,2020,2020),
                 month = c(10,11,12,1,2,3))

Anyone has an easy solution?

Joep_S
  • 481
  • 4
  • 22
  • What is your expected output? – Ronak Shah Aug 04 '20 at 09:11
  • Just 2020_03 or 2020_3. I only need to compare it with year month of the system date and filter data based on the result of the comparison. Basically, what I want to do is keep those records in the DF that have a year/month value prior to the year month of the system date (or any other year/mon) – Joep_S Aug 04 '20 at 09:13

2 Answers2

1

Not sure if you want this output:

df %>%
  group_by(year) %>%
  mutate(max = paste0(year, "_", max(month)))


    # A tibble: 6 x 4
# Groups:   year [2]
     ID  year month max    
  <int> <dbl> <dbl> <chr>  
1     1  2019    10 2019_12
2     2  2019    11 2019_12
3     3  2019    12 2019_12
4     4  2020     1 2020_3 
5     5  2020     2 2020_3 
6     6  2020     3 2020_3 
Martin Schmelzer
  • 23,283
  • 6
  • 73
  • 98
1

You can create a date object first :

df <- transform(df, date = as.Date(paste(1, month, year, sep = "-"), '%d-%m-%Y'))

Calculate max date

max(df$date)
#[1] "2020-03-01"

To get the index of maximum row in df you can use which.max.

ind <- which.max(df$date)
ind
#[1] 6

If you want output in specific format :

paste(df$year[ind], df$month[ind], sep = "_")
#[1] "2020_3"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213