0

How to select the data that correspond to the last day of the month? For example I have the dataset which contains the daily data of V1 from 2000 to 2016. All I need is to select the last day of each month, i.e the 31/01/2001, 28/02/2001 etc for all years. The date format is DD/MM/YYYY.

 V1         V2
4.59 29/12/2000
4.59 01/01/2001
4.58 02/01/2001
4.52 03/01/2001
4.54 04/01/2001
4.58 05/01/2001
......
4.09 26/01/2001
4.50 27/01/2001
4.18 28/01/2001
4.11 29/01/2001
3.54 30/01/2001
4.98 31/01/2001  <- Select this row!
M--
  • 25,431
  • 8
  • 61
  • 93
Hercules Apergis
  • 423
  • 6
  • 20
  • 1
    It's really pretty elementary subsetting by group and operating on it (using `max()`. The answer by Joel Wilson solves it as a one-liner (with the two small caveats I mentioned as a comment). – Dirk Eddelbuettel Dec 17 '16 at 22:35

4 Answers4

3
library(data.table)
library(lubridate)

# for each unique combo of year-mon get the last entry
setDT(df)[order(V2), .(V1[which.max(V2)], V2[which.max(V2)]), by = .(year(V2), month(V2))] 
#   year month   V1         V2
#1: 2000    12 4.59 2000-12-29
#2: 2001     1 4.98 2001-01-31
Uwe
  • 41,420
  • 11
  • 90
  • 134
joel.wilson
  • 8,243
  • 5
  • 28
  • 48
  • 1
    This answer is correct and works eg with the sample data in akrun's answer _provided your turn the second column into a Date type_ eg via `df[,2] <- anytime::anydate(df[,2])`. Also you do not need package `lubridate` as `data.table` already has `year()` and `month()` etc pp – Dirk Eddelbuettel Dec 17 '16 at 22:34
3

This can be accomplished with base R as well.

df[df$V2 %in% unique(as.Date(format(df$V2 + 28, "%Y-%m-01")) - 1),]
    V1         V2
12 4.98 2001-01-31

This uses a trick I picked up from one of Dirk Dirk Eddelbuettel's SO answers. The idea is to set the date to the first day of the following month and then subtract 1 from it.

data

df <- structure(list(V1 = c(4.59, 4.59, 4.58, 4.52, 4.54, 4.58, 4.09, 
4.5, 4.18, 4.11, 3.54, 4.98), V2 = structure(c(11320, 11323, 
11324, 11325, 11326, 11327, 11348, 11349, 11350, 11351, 11352, 
11353), class = "Date")), .Names = c("V1", "V2"), row.names = c(NA, 
-12L), class = "data.frame")

proof of concept

# construct a vector of dates for 10 years, 2001 through 2010
myDates <- seq(as.Date("2001-01-01"), as.Date("2010-12-31"), by="day")

# pull off the final days of the month
finalDays <-
       myDates[myDates %in% unique(as.Date(format(myDates + 28, "%Y-%m-01")) - 1)]

# Take a look at first 5 and last 5
c(head(finalDays, 5), tail(finalDays, 5))
 [1] "2001-01-31" "2001-02-28" "2001-03-31" "2001-04-30" "2001-05-31"
 [6] "2010-08-31" "2010-09-30" "2010-10-31" "2010-11-30" "2010-12-31"

# get length, 12 * 10 = 120
length(finalDays)
[1] 120

# make sure there are no repeated values
length(unique(finalDays))
[1] 120
lmo
  • 37,904
  • 9
  • 56
  • 69
  • Haveing the `+28` in that location gives an error of adding a number to a factor. Removing it i only get the invalid trim argument. – Hercules Apergis Dec 17 '16 at 19:30
  • You need to format your date column (V2) using `as.Date`. Something like `df$V2 <- as.Date(df$V2, format="%d/%m/%Y")` will work. – lmo Dec 17 '16 at 19:39
  • 2
    It doesn't return the correct number of dates...there are quite a few who are skipped! – Hercules Apergis Dec 17 '16 at 19:51
  • 1
    It works on the example that you provided. Your comment is too vague to add anything further. Please take the time to read this post on [how to provide a great R example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. – lmo Dec 17 '16 at 20:04
  • 1
    i understand..but i have daily data from a 16 year period...i really don't know how can i help you with that... – Hercules Apergis Dec 17 '16 at 20:07
  • See my proof of concept, using daily dates for a ten year period. – lmo Dec 17 '16 at 20:19
  • 1
    that does work in your data...in mine however it doesn't. I should be getting 184 months. But instead I am getting 133. – Hercules Apergis Dec 17 '16 at 22:10
2

We can use dplyr

library(dplyr)
library(lubridate)
library(zoo)

If we need only the last day of month and not the last day that is found in the dataset

 df %>% 
      filter(dmy(V2) == as.Date(as.yearmon(dmy(V2)), frac=1))
 #    V1         V2
 #1 4.98 31/01/2001

But, if we need to filter the last day found in the dataset for each month

df %>%
    mutate(V3 = dmy(V2))%>%
    group_by(month = month(V3), year = year(V3)) %>%
    slice(which.max(day(V3))) %>%
    ungroup() %>%
    select(-month, -year, -V3)
#     V1         V2
#   <dbl>      <chr>
#1  4.98 31/01/2001
#2  4.59 29/12/2000

If it is only grouped by 'month', just remove the year = year(V3)) in the group_by and we will get

df %>%
    mutate(V3 = dmy(V2))%>%
    group_by(month = month(V3)) %>%
    slice(which.max(day(V3))) %>%
    ungroup() %>%
    select(-month,  -V3)

data

df <- structure(list(V1 = c(4.59, 4.59, 4.58, 4.52, 4.54, 4.58, 4.09, 
4.5, 4.18, 4.11, 3.54, 4.98), V2 = c("29/12/2000", "01/01/2001", 
"02/01/2001", "03/01/2001", "04/01/2001", "05/01/2001", "26/01/2001", 
"27/01/2001", "28/01/2001", "29/01/2001", "30/01/2001", "31/01/2001"
)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, 
-12L))
akrun
  • 874,273
  • 37
  • 540
  • 662
1
subset(df, as.POSIXlt(V2 + 1)$mday == 1)

## you don't have 31-Dec in your data
#     V1         V2
# 1 4.98 31/01/2001
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187