1

I have a problem I have been working. I have a dataset of dates (in a data frame format) that I need to search through and find the last day of each month and put them into a new data frame. There is also a value in the next column that goes with it. Heres a sample of the dates.

   [1] "2015-05-21" "2015-05-20" "2015-05-19" "2015-05-18" "2015-05-15" "2015-05-14"
   [7] "2015-05-13" "2015-05-12" "2015-05-11" "2015-05-08" "2015-05-07" "2015-05-06"
  [13] "2015-05-05" "2015-05-04" "2015-05-01" "2015-04-30" "2015-04-29" "2015-04-28"
  [19] "2015-04-27" "2015-04-24" "2015-04-23" "2015-04-22" "2015-04-21" "2015-04-20"
  [25] "2015-04-17" "2015-04-16" "2015-04-15" "2015-04-14" "2015-04-13" "2015-04-10"
  [31] "2015-04-09" "2015-04-08" "2015-04-07" "2015-04-06" "2015-04-02" "2015-04-01"
  [37] "2015-03-31" "2015-03-30" "2015-03-27" "2015-03-26" "2015-03-25" "2015-03-24"
  [43] "2015-03-23" "2015-03-20" "2015-03-19" "2015-03-18" "2015-03-17" "2015-03-16"
  [49] "2015-03-13" "2015-03-12" "2015-03-11" "2015-03-10" "2015-03-09" "2015-03-06"
  [55] "2015-03-05" "2015-03-04" "2015-03-03" "2015-03-02" "2015-02-27" "2015-02-26"
  [61] "2015-02-25" "2015-02-24" "2015-02-23" "2015-02-20" "2015-02-19" "2015-02-18"
  [67] "2015-02-17" "2015-02-13" "2015-02-12" "2015-02-11" "2015-02-10" "2015-02-09"
  [73] "2015-02-06" "2015-02-05" "2015-02-04" "2015-02-03" "2015-02-02" "2015-01-30"
  [79] "2015-01-29" "2015-01-28" "2015-01-27" "2015-01-26" "2015-01-23" "2015-01-22"
  [85] "2015-01-21" "2015-01-20" "2015-01-16" "2015-01-15" "2015-01-14" "2015-01-13"
  [91] "2015-01-12" "2015-01-09" "2015-01-08" "2015-01-07" "2015-01-06" "2015-01-05"
  [97] "2015-01-02" "2014-12-31" "2014-12-30" "2014-12-29" "2014-12-26" "2014-12-24"
 [103] "2014-12-23" "2014-12-22" "2014-12-19" "2014-12-18" "2014-12-17" "2014-12-16"
 [109] "2014-12-15" "2014-12-12" "2014-12-11" "2014-12-10" "2014-12-09" "2014-12-08"
 [115] "2014-12-05" "2014-12-04" "2014-12-03" "2014-12-02" "2014-12-01" "2014-11-28"
 [121] "2014-11-26" "2014-11-25" "2014-11-24" "2014-11-21" "2014-11-20" "2014-11-19"
 [127] "2014-11-18" "2014-11-17" "2014-11-14" "2014-11-13" "2014-11-12" "2014-11-11"
 [133] "2014-11-10" "2014-11-07" "2014-11-06" "2014-11-05" "2014-11-04" "2014-11-03"
 [139] "2014-10-31" "2014-10-30" "2014-10-29" "2014-10-28" "2014-10-27" "2014-10-24"
 [145] "2014-10-23" "2014-10-22" "2014-10-21" "2014-10-20" "2014-10-17" "2014-10-16"
 [151] "2014-10-15" "2014-10-14" "2014-10-13" "2014-10-10" "2014-10-09" "2014-10-08"
 [157] "2014-10-07" "2014-10-06" "2014-10-03" "2014-10-02" "2014-10-01" "2014-09-30"
 [163] "2014-09-29" "2014-09-26" "2014-09-25" "2014-09-24" "2014-09-23" "2014-09-22"
 [169] "2014-09-19" "2014-09-18" "2014-09-17" "2014-09-16" "2014-09-15" "2014-09-12"

This is a small portion. There are 5700 rows...

Tony
  • 83
  • 3
  • 10
  • 4
    Please read [how to create a reproduicble example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for better ways to include sample data. This data cannot be easily imported back into R. Also, when you say the "last day of the month" do you mean the actually last day (in which they may be no observations) or do you mean the last observed day in any given month (which might be the first of the month of there is only one observation). It's helpful to give the desired ouput for your sample input when asking a question like this. – MrFlick Jun 05 '15 at 18:45
  • Alright. I will fix the data. As for your question, I need it to find the last day of each month in my data frame. So for example,I want it to print out `2015-05-21` and `2015-04-30` and `2015-03-31` and so on. @MrFlick – Tony Jun 05 '15 at 18:47
  • I find the header of the Q misleading. Last date of each month won't necessarily be in your data set. You want the max date of month. – dule arnaux Sep 25 '20 at 00:23

3 Answers3

10

1) Try tapply like this:

dates <- c("2015-05-13", "2015-05-12", "2015-05-11", "2015-04-27", 
      "2015-04-24", "2015-04-23")

tapply(dates, substr(dates, 1, 7), max)

or this:

library(zoo)
tapply(dates, as.yearmon(dates), max)

2) If the dates are in a data frame:

 DF <- data.frame(dates, stringsAsFactors = FALSE)

 aggregate(DF["dates"], list(month = substr(DF$dates, 1, 7)), max)

or replace substr(...) with as.yearmon(DF$dates).

3) This picks out the last row of each month. In this case ave returns a character variable so need to use as.logical to convert it to logical:

isMax <- function(x) seq_along(x) == which.max(as.Date(x))
subset(DF, as.logical(ave(dates, substr(dates, 1, 7), FUN = isMax)))

or use as.yearmon(dates) in place of substr(...). The following definition of isMax could be substituted

isMax <- function(x) seq_along(x) == which.max(as.Date(x))

and gives the same result except when there are multiple maxima. In that case it gives the first only whereas the earlier definition gives them all. As which.max does not work with "character" class we converted dates to "Date" class.

Next time please use dput to display your data or otherwise provide it in reproducible form as we did above. Also cut it down to the minimum amount needed to illustrate the problem.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • For approach 2) what would be the change to get the latest value by year, instead of by month? Does it have something to do with `list(month = substr(DF$dates, 1, 7))` ? – Rafael Jun 02 '17 at 14:34
  • Aha! I see: `list(month = substr(DF$dates, 1, 4))` extracts just the year (for years with 4 digits). Very clever solution. Thanks! – Rafael Jun 02 '17 at 14:38
  • Actually now I think it's returning the `max` value for every year, is there a way to change it to just be the last value? – Rafael Jun 02 '17 at 16:23
  • 1
    In place of `max` use `function(x) tail(x, 1)` to get the last rather than the maximum. – G. Grothendieck Jun 02 '17 at 17:36
5

Here's an alternative solution using dplyr to group by month and year and then filter the max i.e. last date of each yearmonth:

df <- data.frame(dates=as.Date(c("2015-05-21", "2015-05-20", "2015-05-19", "2015-05-18",
                 "2015-05-15", "2015-05-14","2014-12-05", "2014-12-04", "2014-12-03",
                 "2014-12-02", "2014-12-01", "2014-11-28")))

library(dplyr)

df %>% 
  group_by(strftime(dates, "%Y-%m")) %>% #Groups by the yearmonths
  filter(dates == max(dates)) %>%        #Take the last date of each group
  .$dates                                #Returns the filtered dates as a vector

Resulting in:

[1] "2015-05-21" "2014-12-05" "2014-11-28"

Thanks to @akrun for tips on fixing and improving the code.

Molx
  • 6,816
  • 2
  • 31
  • 47
  • I guess there are many ways to do this, but I'd sort and select the max as the first row: `df %>% arrange(desc(dates)) %>% group_by(yearmon=strftime(dates, "%Y-%m")) %>% slice(1)` – Frank Jun 05 '15 at 19:52
0

if you have a data frame,'a' including date variable,'date',

library(dplyr)
a<-mutate(a,endmonth=as.Date(paste0(as.numeric(format(date,"%Y%m"))+1,"01"),"%Y%m%d")-1)     

for example :

a<-data.frame(date=as.Date("2016-02-15","2016-04-05"))
a1<-mutate(a,endmonth=
             as.Date(paste0(as.numeric(format(date,"%Y%m"))+1,"01"),"%Y%m%d")-1)
>a1
        date   endmonth
1 2016-02-15 2016-02-29
2 2016-04-05 2016-04-30
hyunwoo jeong
  • 1,534
  • 1
  • 15
  • 14