0

I have a dataset ('DF1') that has count data. It looks like this:

Species Date Site n
AMCR 6/1/2021 SVC 14
AMCR 6/1/2021 BMA 1
AMCR 6/7/2021 SVA 2
AMCR 6/15/2021 SVA 9
AMCR 6/21/2021 SVA 18
AMCR 6/29/2021 SVA 18

However, my study actually has nine 'Sites' (SVC, BMA, SVA, BMC, TMA, TMC, SRA, SRC, and MCC) and each site has data collected on the same five dates (6/1/2021, 6/8/2021, 6/15/2021, 6/21/2021, and 6/29/2021). DF1 only shows rows for where there were counts in 'n', but if there were no counts, I want the dataframe to be populated with zero counts for each of those dates for each site so that it looks like this:

Species Date Site n
AMCR 6/1/2021 SVC 14
AMCR 6/7/2021 SVC 0
AMCR 6/15/2021 SVC 0
AMCR 6/21/2021 SVC 0
AMCR 6/29/2021 SVC 0
AMCR 6/1/2021 BMA 1
AMCR 6/7/2021 BMA 0
AMCR 6/15/2021 BMA 0
AMCR 6/21/2021 BMA 0
AMCR 6/29/2021 BMA 0
AMCR 6/1/2021 SVA 0
AMCR 6/7/2021 SVA 2
AMCR 6/15/2021 SVA 9
AMCR 6/21/2021 SVA 18
AMCR 6/29/2021 SVA 18
AMCR 6/1/2021 BMC 0
AMCR 6/7/2021 BMC 0
AMCR 6/15/2021 BMC 0
AMCR 6/21/2021 BMC 0
AMCR 6/29/2021 BMC 0
AMCR 6/1/2021 TMA 0
AMCR 6/7/2021 TMA 0
AMCR 6/15/2021 TMA 0
AMCR 6/21/2021 TMA 0
AMCR 6/29/2021 TMA 0
AMCR 6/1/2021 TMC 0
AMCR 6/7/2021 TMC 0
AMCR 6/15/2021 TMC 0
AMCR 6/21/2021 TMC 0
AMCR 6/29/2021 TMC 0
AMCR 6/1/2021 SRA 0
AMCR 6/7/2021 SRA 0
AMCR 6/15/2021 SRA 0
AMCR 6/21/2021 SRA 0
AMCR 6/29/2021 SRA 0
AMCR 6/1/2021 SRC 0
AMCR 6/7/2021 SRC 0
AMCR 6/15/2021 SRC 0
AMCR 6/21/2021 SRC 0
AMCR 6/29/2021 SRC 0
AMCR 6/1/2021 MCC 0
AMCR 6/7/2021 MCC 0
AMCR 6/15/2021 MCC 0
AMCR 6/21/2021 MCC 0
AMCR 6/29/2021 MCC 0

Is there a way to add rows with 0 counts by checking to see if those date and site combinations don't exist?

Thank you.

Jacob
  • 329
  • 2
  • 10
  • 1
    `tidyr::complete` should work. Check here https://stackoverflow.com/questions/43501670/complete-column-with-group-by-and-complete – TTS Oct 11 '21 at 22:53

1 Answers1

1

dplyr/tidyr

library(dplyr)
library(tidyr)
dat %>%
  complete(Species, Date, Site, fill = list(n = 0))
# # A tibble: 15 x 4
#    Species Date      Site      n
#    <chr>   <chr>     <chr> <dbl>
#  1 AMCR    6/1/2021  BMA       1
#  2 AMCR    6/1/2021  SVA       0
#  3 AMCR    6/1/2021  SVC      14
#  4 AMCR    6/15/2021 BMA       0
#  5 AMCR    6/15/2021 SVA       9
#  6 AMCR    6/15/2021 SVC       0
#  7 AMCR    6/21/2021 BMA       0
#  8 AMCR    6/21/2021 SVA      18
#  9 AMCR    6/21/2021 SVC       0
# 10 AMCR    6/29/2021 BMA       0
# 11 AMCR    6/29/2021 SVA      18
# 12 AMCR    6/29/2021 SVC       0
# 13 AMCR    6/7/2021  BMA       0
# 14 AMCR    6/7/2021  SVA       2
# 15 AMCR    6/7/2021  SVC       0

base R

dat2 <- merge(dat, do.call(expand.grid, lapply(dat[,1:3], unique)), by = names(dat)[1:3], all = TRUE)
dat2
#    Species      Date Site  n
# 1     AMCR  6/1/2021  BMA  1
# 2     AMCR  6/1/2021  SVA NA
# 3     AMCR  6/1/2021  SVC 14
# 4     AMCR 6/15/2021  BMA NA
# 5     AMCR 6/15/2021  SVA  9
# 6     AMCR 6/15/2021  SVC NA
# 7     AMCR 6/21/2021  BMA NA
# 8     AMCR 6/21/2021  SVA 18
# 9     AMCR 6/21/2021  SVC NA
# 10    AMCR 6/29/2021  BMA NA
# 11    AMCR 6/29/2021  SVA 18
# 12    AMCR 6/29/2021  SVC NA
# 13    AMCR  6/7/2021  BMA NA
# 14    AMCR  6/7/2021  SVA  2
# 15    AMCR  6/7/2021  SVC NA
dat2$n <- ifelse(is.na(dat2$n), 0, dat2$n)
dat2
#    Species      Date Site  n
# 1     AMCR  6/1/2021  BMA  1
# 2     AMCR  6/1/2021  SVA  0
# 3     AMCR  6/1/2021  SVC 14
# 4     AMCR 6/15/2021  BMA  0
# 5     AMCR 6/15/2021  SVA  9
# 6     AMCR 6/15/2021  SVC  0
# 7     AMCR 6/21/2021  BMA  0
# 8     AMCR 6/21/2021  SVA 18
# 9     AMCR 6/21/2021  SVC  0
# 10    AMCR 6/29/2021  BMA  0
# 11    AMCR 6/29/2021  SVA 18
# 12    AMCR 6/29/2021  SVC  0
# 13    AMCR  6/7/2021  BMA  0
# 14    AMCR  6/7/2021  SVA  2
# 15    AMCR  6/7/2021  SVC  0

Data

dat <- structure(list(Species = c("AMCR", "AMCR", "AMCR", "AMCR", "AMCR", "AMCR"), Date = c("6/1/2021", "6/1/2021", "6/7/2021", "6/15/2021", "6/21/2021", "6/29/2021"), Site = c("SVC", "BMA", "SVA", "SVA", "SVA", "SVA"), n = c(14L, 1L, 2L, 9L, 18L, 18L)), class = "data.frame", row.names = c(NA, -6L))
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Great solutions. Thank you! I know this question was closed due to similarity to another question, so I apologize. I find it hard to know the necessary search terms to search for previous questions first. Is there a document of highly used R functions for data manipulation that isn't just individual package documentation? A compilation of several different packages? – Jacob Oct 12 '21 at 13:56
  • 1
    There are a million such docs and the vast majority of them are not that good ;-). Knowing how to search on Stack (and search-engines) is an acquired skill, most of that is knowing what appropriate buzzwords to include, and that comes with experience and time. The fact that this question was closed as a dupe is _kind of_ a knock on you for not finding the others, but in my mind it is not about that: the purpose of closing it is to keep the other question's answers percolating to the top of searches so that its answers can be refined and reused more. No worries, Jacob, and sorry, no easy answer – r2evans Oct 12 '21 at 14:57