1

I am trying to create a column that counts the number of unique visits to a site based on the grouping. However in my current operation is counting the same date as visit_3 and visit_4 because of having captures and recaptures. How do I not pick simple count the rows by the grouping but only by unique dates per site.

With my current process the last observation at "admin_pond on "2022-05-19" for the capture_type of "recapture" should have "visit_3" but it is showing at "visit_4". I want the unique number of visits per site and date irregardless of capture_type. So the last two values observations should show "visit_3" because they happened on the same date at the same site.

Data

data <- structure(list(site = c("admin_pond", "admin_pond", "admin_pond", 
"admin_pond"), date = structure(c(19123, 19130, 19131, 19131), class = "Date"), 
    n = c(9L, 15L, 11L, 9L), capture_type = c("new", "new", "new", 
    "recapture")), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

Method

bull_frog_visits <- data %>% 
  group_by(site) %>% 
  mutate(n_visit = 1:n(),
         n_visit = paste0("visit_", n_visit, sep = ""))

head(bull_frog_visits)
user438383
  • 5,716
  • 8
  • 28
  • 43
Eizy
  • 253
  • 1
  • 9

2 Answers2

4

This should work:

data %>% 
  group_by(site) %>% 
  mutate(n_visit = match(date, unique(date)),
         n_visit = paste0("visit_", n_visit, sep = ""))
# # A tibble: 4 × 5
# # Groups:   site [1]
#   site       date           n capture_type n_visit
#   <chr>      <date>     <int> <chr>        <chr>  
# 1 admin_pond 2022-05-11     9 new          visit_1
# 2 admin_pond 2022-05-18    15 new          visit_2
# 3 admin_pond 2022-05-19    11 new          visit_3
# 4 admin_pond 2022-05-19     9 recapture    visit_3
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
2

Another option is to use consecutive_id from dplyr > 1.1.0 which is the equivalent of data.tables rleid:

data %>% 
  mutate(n_visit = paste0("visit_", consecutive_id(date)))

# A tibble: 4 × 5
  site       date           n capture_type n_visit
  <chr>      <date>     <int> <chr>        <chr>  
1 admin_pond 2022-05-11     9 new          visit_1
2 admin_pond 2022-05-18    15 new          visit_2
3 admin_pond 2022-05-19    11 new          visit_3
4 admin_pond 2022-05-19     9 recapture    visit_3
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • this may work but in my full data set I have multiple sites – Eizy May 22 '23 at 20:03
  • 1
    this is no problem we could do `data %>% mutate(n_visit = paste0("visit_", consecutive_id(date)), .by=site)`. Note `dplyr` > 1.1.0 – TarJae May 22 '23 at 20:05