I don't think this is a duplicate question of Range join data.frames - specific date column with date ranges/intervals in R because I am not looking to merge data frames BUT to delete from a data frame all the records that are outside time frames that I have specified in another data frame.
Specifically, I want to select records grouped by the variable id_time
, in a file that looks like the one below hr
, using the date periods in a second file dates
that has the time period of interest for each id_time
.
The file hr
originally has >1000 rows, each row is an observation that belongs to an individual id_time
, and each observation has the date it was recorded in Local date
. There are many observations per individual, each with different dates. Roughly this file looks like this:
hr<-read.table(text =
"id_time season sex Local_date Area(ha)
10C_MTHM Late_dry M 2015/09/01 12231.49898
10C_MTHM Late_dry M 2015/10/31 15883.57836
10C_MTHM Wet M 2015/11/30 2725.42549
10C_MTHM Wet M 2015/12/31 40743.25861
10C_MTHM Wet M 2016/01/31 44685.19565
10C_MTHM Wet M 2016/02/26 21313.59966
10C_MTHM Wet M 2014/12/31 36782.41615
10C_MTHM Wet M 2015/01/31 126159.3232
10C_MTHM Wet M 2015/02/28 113034.0324
10C_MTHM Early_dry M 2015/03/31 50179.50564
10C_MTHM Early_dry M 2015/04/30 29744.83677
10C_MTHM Early_dry M 2015/05/31 33990.54416
10C_MTHM Early_dry M 2015/06/30 31081.3867
10E_CHIM2 Late_dry M 2015/09/30 5467.727522
10E_CHIM2 Late_dry M 2015/10/31 925.188892
10E_CHIM2 Wet M 2015/11/30 4663.484598
10E_CHIM2 Wet M 2015/12/31 18767.86083
10E_CHIM2 Wet M 2016/01/31 25163.76076
10E_CHIM2 Wet M 2016/02/26 40432.86667
10E_CHIM2 Late_dry M 2014/09/30 12403.64243
10E_CHIM2 Late_dry M 2014/10/31 15391.80744
11C_SDBM Late_dry M 2015/07/31 292012.0909
11C_SDBM Late_dry M 2015/08/31 149293.0196
11C_SDBM Late_dry M 2015/09/30 88775.83245
11C_SDBM Late_dry M 2015/10/31 20980.49625
11C_SDBM Wet M 2015/11/30 44679.24235
11C_SDBM Wet M 2015/12/31 85124.26871
11C_SDBM Wet M 2016/01/31 4573.904479",
header = TRUE)
The second file dates
has only a few rows, and the only information here is a range of dates for each individual id time
that appears in the hr
file. This dates
file, with the date ranges, is the one I want R to use to select all the rows with dates within each date range of the respective individual id_time
, deleting all the other rows for which the dates are outside the specific period:
dates<-read.table(text =
"date_id start_date end_date
10C_MTHM 2015/02/24 2016/02/24
11C_SDBM 2015/01/01 2015/06/30
10E_CHIM2 2015/01/01 2016/01/01",
header = TRUE)
My intended outcome is the hr file (just one file) WITH all the records for each individual id_time
for which dates fall within the period of interest shown in the dates
file. All the rest of the records for each individual outside the time period of interested should be deleted in the output.
For example for the first individual 10C_MTHM, I would like the final database to include only the records of this individual that are within 2015/02/24 and 2016/02/24, as specified in the dates
file. And so forth with each individual. All these records in one single database.
I found a question similar to this but the difference is that my date ranges are not in a column in the same file but in a second file R: Subsetting a data frame using a list of dates as the filter
How would I create a code to select records based on the information specified in a different file?
Many thanks!