-2

New to Stackoverflow.

I have a dataset that are as following:

There are only Ants in this dataset.

Df_Ant_Observations

Type                  Observation
Ant                       2022-05-22
Ant                       2021-04-23
Ant                       2022-06-22
Ant                       2014-07-22
Ant                       2018-02-25
Ant                       2021-05-22
Ant                       2018-05-22
Ant                       2021-05-23
Ant                       2022-06-24

500 + columns 500 + columns

I want to filter these observations so that the first observation for every year is filtered and selected (only one per year). I'm planning to use this filtered data in a ggplot to show how the first observations every year differs.

filter() and lubridate are functions that I think could be useable.

Anyone has any ideas how to filter Df_Ant_Observations as desired? :)

Henry Oufh
  • 135
  • 1
  • 1
  • 8
  • 3
    It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please do not post image of data or code. We cannot copy/paste such values into R for testing. – MrFlick Jul 15 '22 at 19:05
  • I tried to follow your feedback. Please let me know if I should bring some more to facilitate our discussion. @MrFlick – Henry Oufh Jul 15 '22 at 19:12
  • 1
    You should share data in a more reproducible form like a `dput()` so we know how your values are coded. It's not clear if the Observation column is actually encoded as a proper Date object or if it's a character value or something else. Plus it's nice to give the desired output so we can test. Right now it seems like just one row will be returned, but in reality I assume you have more than one Type? What do you want to do with the other columns? That's why having a clear examples of desired output for the given sample input is so useful. – MrFlick Jul 15 '22 at 19:15
  • Thanks for letting me now. Useful so I can as fast as possible learn how to proper ask questions. I updated as well above that there are only ants in this dataset. @MrFlick – Henry Oufh Jul 15 '22 at 19:20

1 Answers1

1

base R

dat[ave(dat$Observation, dat$Type, FUN = function(z) !duplicated(format(as.Date(z), format = "%Y"))) == "TRUE",]
#   Type Observation
# 1  Ant  2022-05-22
# 2  Ant  2021-04-23
# 4  Ant  2014-07-22
# 5  Ant  2018-02-25

dplyr

library(dplyr)
dat %>%
  mutate(Observation = as.Date(Observation)) %>%
  group_by(Type, year = format(Observation, format = "%Y")) %>%
  slice(1) %>%
  ungroup()
# # A tibble: 4 x 3
#   Type  Observation year 
#   <chr> <date>      <chr>
# 1 Ant   2014-07-22  2014 
# 2 Ant   2018-02-25  2018 
# 3 Ant   2021-04-23  2021 
# 4 Ant   2022-05-22  2022 

Data

dat <- structure(list(Type = c("Ant", "Ant", "Ant", "Ant", "Ant", "Ant", "Ant", "Ant", "Ant"), Observation = c("2022-05-22", "2021-04-23", "2022-06-22", "2014-07-22", "2018-02-25", "2021-05-22", "2018-05-22", "2021-05-23", "2022-06-24")), class = "data.frame", row.names = c(NA, -9L))

While I "correctly" determine the year given valid-looking dates, we could also just substring(Observation, 1, 4) and go with that as well.

r2evans
  • 141,215
  • 6
  • 77
  • 149