0

I don't think this is a duplicate question of Range join data.frames - specific date column with date ranges/intervals in R because I am not looking to merge data frames BUT to delete from a data frame all the records that are outside time frames that I have specified in another data frame.

Specifically, I want to select records grouped by the variable id_time, in a file that looks like the one below hr, using the date periods in a second file dates that has the time period of interest for each id_time.

The file hr originally has >1000 rows, each row is an observation that belongs to an individual id_time, and each observation has the date it was recorded in Local date. There are many observations per individual, each with different dates. Roughly this file looks like this:

hr<-read.table(text = 
                     "id_time   season  sex Local_date  Area(ha)
                   10C_MTHM Late_dry    M   2015/09/01  12231.49898
                   10C_MTHM Late_dry    M   2015/10/31  15883.57836
                   10C_MTHM Wet         M   2015/11/30  2725.42549
                   10C_MTHM Wet         M   2015/12/31  40743.25861
                   10C_MTHM Wet         M   2016/01/31  44685.19565
                   10C_MTHM Wet         M   2016/02/26  21313.59966
                   10C_MTHM Wet         M   2014/12/31  36782.41615
                   10C_MTHM Wet         M   2015/01/31  126159.3232
                   10C_MTHM Wet         M   2015/02/28  113034.0324
                   10C_MTHM Early_dry   M   2015/03/31  50179.50564
                   10C_MTHM Early_dry   M   2015/04/30  29744.83677
                   10C_MTHM Early_dry   M   2015/05/31  33990.54416
                   10C_MTHM Early_dry   M   2015/06/30  31081.3867
                   10E_CHIM2    Late_dry    M   2015/09/30  5467.727522
                   10E_CHIM2    Late_dry    M   2015/10/31  925.188892
                   10E_CHIM2    Wet         M   2015/11/30  4663.484598
                   10E_CHIM2    Wet         M   2015/12/31  18767.86083
                   10E_CHIM2    Wet         M   2016/01/31  25163.76076
                   10E_CHIM2    Wet         M   2016/02/26  40432.86667
                   10E_CHIM2    Late_dry    M   2014/09/30  12403.64243
                   10E_CHIM2    Late_dry    M   2014/10/31  15391.80744
                   11C_SDBM Late_dry    M   2015/07/31  292012.0909
                   11C_SDBM Late_dry    M   2015/08/31  149293.0196
                   11C_SDBM Late_dry    M   2015/09/30  88775.83245
                   11C_SDBM Late_dry    M   2015/10/31  20980.49625
                   11C_SDBM Wet         M   2015/11/30  44679.24235
                   11C_SDBM Wet         M   2015/12/31  85124.26871
                   11C_SDBM Wet         M   2016/01/31  4573.904479",
                   header = TRUE)

The second file dates has only a few rows, and the only information here is a range of dates for each individual id time that appears in the hr file. This dates file, with the date ranges, is the one I want R to use to select all the rows with dates within each date range of the respective individual id_time, deleting all the other rows for which the dates are outside the specific period:

dates<-read.table(text = 
                    "date_id    start_date  end_date
                    10C_MTHM    2015/02/24  2016/02/24
                    11C_SDBM    2015/01/01  2015/06/30
                    10E_CHIM2   2015/01/01  2016/01/01",
                                        header = TRUE)

My intended outcome is the hr file (just one file) WITH all the records for each individual id_time for which dates fall within the period of interest shown in the dates file. All the rest of the records for each individual outside the time period of interested should be deleted in the output.

For example for the first individual 10C_MTHM, I would like the final database to include only the records of this individual that are within 2015/02/24 and 2016/02/24, as specified in the dates file. And so forth with each individual. All these records in one single database.

I found a question similar to this but the difference is that my date ranges are not in a column in the same file but in a second file R: Subsetting a data frame using a list of dates as the filter

How would I create a code to select records based on the information specified in a different file?

Many thanks!

Community
  • 1
  • 1
AnnK
  • 189
  • 1
  • 10
  • What is your intended outcome? Do you want separate data.frames for each date interval? Do you want to construct a grouping variable? Please include an example of your intended outcome in your question. – lmo Jun 03 '16 at 12:04
  • 1
    I believe there is a more modern solution to this: `setkey(hr,id_time); setkey(dates,date_id); hr[dates][anywhere(Local_date, start_date, end_date)]` using the dev version of `data.table`. – mtoto Jun 03 '16 at 12:35
  • @lmo I have edited my question to make it clearer. I hope this clarification helps. – AnnK Jun 03 '16 at 13:05
  • 1
    [roll join with start/end window](http://stackoverflow.com/questions/24480031) – zx8754 Jun 03 '16 at 18:53

0 Answers0