0

I have a long term sightings data set of identified individuals (~16,000 records from 1979- 2019) and I would like to subset the same date range (YYYY-09-01 to YYYY(+1)-08-31) across years in R. I have successfully done so for each "year" (and obtained the unique IDs) using:

library(dplyr)
library(lubridate)

year79 <-data%>%
  select(ID, Sex, AgeClass, Age, Date, Month, Year)%>%
  filter(Date>= as.Date("1978-09-01") & Date<= as.Date("1979-08-31")) %>%
  filter(!duplicated(ID))

year80 <-data%>%
  select(ID, Sex, AgeClass, Age, Date, Month, Year)%>%
  filter(Date>= as.Date("1979-09-01") & Date<= as.Date("1980-08-31")) %>%
  filter(!duplicated(ID))

I would like to clean up the code and ideally not need to specify the each range (just have it iterate through). I am new at R and stuck how to do this. Any suggestions?

FYI "Month" and "Year" are included for producing a table via melt and cast later on.

example data:

    ID Year   Month Day  Date       AgeClass Age Sex
1 1034 1979     4  17 1979-04-17        U   3   F
2 1127 1979     5   3 1979-05-03        A  13   F
3 1222 1979     5   3 1979-05-03        U   0   F
4 1303 1979     6  16 1979-06-16        U   0   F
5 1153 1980     4  16 1980-04-16        C   0   F
6 1014 1980     4  16 1980-04-16        U   6   F
                  ID Year   Month Day  Date       AgeClass Age  Sex
16428           2503 2019     5   8 2019-05-08        U  NA    F
16429           3760 2019     5   8 2019-05-08        A  12    F
16430           4080 2019     5   9 2019-05-09        A   9    F
16431           4095 2019     5   9 2019-05-09        A   9    U
16432           1204 2019     5  11 2019-05-11        A  37    F
16433           1204 2019     5  11 2019-05-11        A  NA    F

#> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
BM0329
  • 1
  • 1
  • You should provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – M-- Dec 17 '19 at 21:45

1 Answers1

0

Every year has 122 days from Sept 1 to Dec 31 inclusive, so you could add a variable marking the "fiscal year" for each row:

set.seed(42)
library(dplyr)
my_data <- tibble(ID = 1:6,
                  Date = as.Date("1978-09-01") + c(-1, 0, 1, 364, 365, 366))
my_data
# There are 122 days from each Aug 31 (last of the FY) to the end of the CY.
# lubridate::ymd(19781231) - lubridate::ymd(19780831)

my_data %>%
  mutate(FY = year(Date + 122))

## A tibble: 6 x 3
#     ID Date          FY
#  <int> <date>     <dbl>
#1     1 1978-08-31  1978
#2     2 1978-09-01  1979
#3     3 1978-09-02  1979
#4     4 1979-08-31  1979
#5     5 1979-09-01  1980
#6     6 1979-09-02  1980

You could keep the data in one table and do subsequent analysis using group_by(FY), or use %>% split(.$FY) to put each FY into its own element of a list. From my limited experience, I think it's generally an anti-pattern to create separate data frames for annual subsets of your data, as that makes your code harder to maintain, troubleshoot, and modify.

Jon Spring
  • 55,165
  • 4
  • 35
  • 53