Hello I am new to this forum and relatively new to coding in R. I am having problems building a frequency table that fills in counts of zeros when some of my variables are linked. My test data consists of vegetation counts of 3 vegetation genera, in 10 plots with 2 treatment types and two subsite types. Each of my plots can only be wet or dry and treatment or control not both.
Here is how my data is structured:
structure(list(SITE = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "NAKVAK", class = "factor"),
PLOT = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1A",
"1B", "2A", "2B", "3A", "3B", "4A", "4B", "5A", "5B"), class = "factor"),
PLOT2 = c(1L, 1L, 1L, 1L, 1L, 1L), SUBSITE = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("DRY", "WET"), class = "factor"),
TRTMT = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CTL",
"OTC"), class = "factor"), YEAR = c(2010L, 2010L, 2010L,
2010L, 2010L, 2021L), GENUS = structure(c(3L, 1L, 2L, 2L,
3L, 1L), .Label = c("Betula", "Carex", "Cladonia"), class = "factor"),
LIFEFORM = structure(c(2L, 3L, 1L, 1L, 2L, 3L), .Label = c("FORB",
"LICHEN", "SDECI"), class = "factor"), ABUND = c(1L, 1L,
1L, 1L, 1L, 1L)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
I am trying to build a table that includes Plot ID, Subsite (wet or dry), treatment (OTC or Control), year (2010 or 2021), Genus, and abundance counts. I also want to include zeros, so I need to know when none a particular genus is found. I have been able to build a frequency table that fills in counts of zeros, but the problem is that it also creates variable combinations that don't exist in my data set. For example, plot 1A is always in a dry subsite with an OTC treatment. The code however creates a row for plot 1A, wet subsite, OTC and plot 1A, dry subsite, control etc... I am looking for a way to link my plot, subsite, and treatment variables so that abundance counts of zero are only filled in for combinations of plot, subsite, and treatment that actually exist in my data set.
The functions that seem like they should be my best bet are tidyverse's expand and nesting functions but I am open to any and all suggestions!
**** Edit to include solution
vegdat <- as_tibble(read.csv("DummyDat100.csv"))
spec.sum <- vegdat %>%
group_by(PLOT, SUBSITE, TRTMT, GENUS, YEAR) %>%
summarise(count = sum(ABUND))
Spec.sum <- as.data.table(spec.sum)
#Create a new column that contains all site data
Spec.sum[, 'sampling' := paste(YEAR, PLOT, SUBSITE, TRTMT, sep = '_')]
#Now get rid of columns that were combined into sampling col
Spec.sum <- Spec.sum[, c('sampling', 'GENUS', 'count')]
#Now expand this smaller data.table to include zeroes for absent genera
Spec.sum <- as.data.table(complete(Spec.sum, sampling, GENUS, fill = list(count = 0)))
#Split that sampling col back into original columns
Spec.sum <- Spec.sum[, c('YEAR', 'PLOT', 'SUBSITE', 'TRTMT') := tstrsplit(sampling, '_', keep = c(1,2,3,4))]
#Remove the sampling column (if you want)
Spec.sum[, sampling := NULL]
#Reorder the data.table
Spec.sum.tidy <- Spec.sum[, c('PLOT', 'SUBSITE', 'TRTMT', 'GENUS', 'YEAR', 'count')]