0

I am trying to filter last 3 months data in R w.r.t maximum date present in my date column.

I initially added a column to my existing data frame and used the below expression for filtering the data. I want to eliminate this extra step and do the filtering directly.

last_three_month <- df_1[date_format > max(as.Date(date_format)) %m-% months(4) & 
                         date_format <= max(date_format) , ]

When I use df_1[MonthYear] and filter the data I see an error like

"Error: ‘max’ not meaningful for factors"

Data

MonthYear   Number    Risk 
1/18/2019   101      High AESI
1/18/2019   905 
1/18/2019   909 
1/18/2019   904 
2/18/2019   101      High AESI
2/18/2019   905 
2/18/2019   904 
2/18/2019   909 
2/18/2019   907 
2/18/2019   541      High AESI
2/18/2019   908      High AESI
2/18/2019   906      High AESI
2/18/2019   046 
2/18/2019   018      High AESI
2/18/2019   019 
2/18/2019   002      High AESI
3/18/2019   904 
3/18/2019   907 
3/18/2019   905 

Code

library(dplyr)
library(tibble)
library(reshape)

Input <- read.csv("C:/Users/Documents/Exports/HR.csv")
Output <- Input #%>% filter(Year == 2019)
df_output <- as.data.frame(Output)
date_format <- as.Date(paste("01-", df_output$Month.Year, sep = ""), 
                       format = "%d-%b-%y")
df_1 <- cbind(df_output, date_format)

last_three_month <- df_1[date_format > max(as.Date(date_format)) %m-% months(4) &
                         date_format <= max(date_format) , ]

I am actually trying to filter data directly in the data frame rather than adding one more column and achieving it. Can you please advise - Thanks

Parfait
  • 104,375
  • 17
  • 94
  • 125
san
  • 163
  • 1
  • 15

2 Answers2

1

read.csv imports strings (such as "1/18/2019") as 'factor' columns by default. This categorical data class is not a 'Date'. Instead use readr::read_csv which will probably detect that your date column is a date, and should then allow your code to run as you expect. If you use the RStudio import wizard with the readr option, you can manually select the column types and the corresponding code will be created for you.

CSJCampbell
  • 2,025
  • 15
  • 19
1

Consider using seq() as shown by @G. Grothendieck's answer here to retrieve the date of three months ago from max MonthYear.

Data (some dates converted to 2018 for demonstration of three month filter)

txt <- 'MonthYear   Number    Risk 
"1/18/2018"   101      "High AESI"
"1/18/2018"   905      NA
"1/18/2019"   909      NA
"1/18/2019"   904      NA 
"2/18/2018"   101      "High AESI"
"2/18/2018"   905      NA 
"2/18/2019"   904      NA 
"2/18/2019"   909      NA 
"2/18/2019"   907      NA 
"2/18/2019"   541      "High AESI"
"2/18/2019"   908      "High AESI"
"2/18/2019"   906      "High AESI"
"2/18/2019"   046      NA 
"2/18/2019"   018      "High AESI"
"2/18/2019"   019      NA 
"2/18/2019"   002      "High AESI"
"3/18/2018"   904      NA 
"3/18/2019"   907      NA 
"3/18/2019"   905      NA '

Input <- read.table(text=txt, header=TRUE)

Code

# CONVERT TO DATE
Input$MonthYear <- as.Date(Input$MonthYear, format = "%m/%d/%Y")

# SUBSET DATA
last_three_month_df <- subset(Input, MonthYear > seq(as.Date(max(MonthYear)), length=2, by="-3 months")[2] &
                                     MonthYear <= max(MonthYear))    

last_three_month_df 
#     MonthYear Number      Risk
# 3  2019-01-18    909      <NA>
# 4  2019-01-18    904      <NA>
# 7  2019-02-18    904      <NA>
# 8  2019-02-18    909      <NA>
# 9  2019-02-18    907      <NA>
# 10 2019-02-18    541 High AESI
# 11 2019-02-18    908 High AESI
# 12 2019-02-18    906 High AESI
# 13 2019-02-18     46      <NA>
# 14 2019-02-18     18 High AESI
# 15 2019-02-18     19      <NA>
# 16 2019-02-18      2 High AESI
# 18 2019-03-18    907      <NA>
# 19 2019-03-18    905      <NA>
Parfait
  • 104,375
  • 17
  • 94
  • 125