Frequency table that includes zero counts but only for variable combinations that exist within my data frame

Question

Hello I am new to this forum and relatively new to coding in R. I am having problems building a frequency table that fills in counts of zeros when some of my variables are linked. My test data consists of vegetation counts of 3 vegetation genera, in 10 plots with 2 treatment types and two subsite types. Each of my plots can only be wet or dry and treatment or control not both.

Here is how my data is structured:

structure(list(SITE = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "NAKVAK", class = "factor"), 
PLOT = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1A", 
"1B", "2A", "2B", "3A", "3B", "4A", "4B", "5A", "5B"), class = "factor"), 
PLOT2 = c(1L, 1L, 1L, 1L, 1L, 1L), SUBSITE = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("DRY", "WET"), class = "factor"), 
TRTMT = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CTL", 
"OTC"), class = "factor"), YEAR = c(2010L, 2010L, 2010L, 
2010L, 2010L, 2021L), GENUS = structure(c(3L, 1L, 2L, 2L, 
3L, 1L), .Label = c("Betula", "Carex", "Cladonia"), class = "factor"), 
LIFEFORM = structure(c(2L, 3L, 1L, 1L, 2L, 3L), .Label = c("FORB", 
"LICHEN", "SDECI"), class = "factor"), ABUND = c(1L, 1L, 
1L, 1L, 1L, 1L)), row.names = c(NA, -6L), class = c("tbl_df",

"tbl", "data.frame"))

I am trying to build a table that includes Plot ID, Subsite (wet or dry), treatment (OTC or Control), year (2010 or 2021), Genus, and abundance counts. I also want to include zeros, so I need to know when none a particular genus is found. I have been able to build a frequency table that fills in counts of zeros, but the problem is that it also creates variable combinations that don't exist in my data set. For example, plot 1A is always in a dry subsite with an OTC treatment. The code however creates a row for plot 1A, wet subsite, OTC and plot 1A, dry subsite, control etc... I am looking for a way to link my plot, subsite, and treatment variables so that abundance counts of zero are only filled in for combinations of plot, subsite, and treatment that actually exist in my data set.

The functions that seem like they should be my best bet are tidyverse's expand and nesting functions but I am open to any and all suggestions!

**** Edit to include solution

vegdat <- as_tibble(read.csv("DummyDat100.csv"))
   
 spec.sum <- vegdat %>%
      group_by(PLOT, SUBSITE, TRTMT, GENUS, YEAR) %>% 
      summarise(count = sum(ABUND))

     Spec.sum <- as.data.table(spec.sum)
    
    #Create a new column that contains all site data 
    Spec.sum[, 'sampling' := paste(YEAR, PLOT, SUBSITE, TRTMT, sep = '_')]
    
    #Now get rid of columns that were combined into sampling col
    Spec.sum <- Spec.sum[, c('sampling', 'GENUS', 'count')]
    
    #Now expand this smaller data.table to include zeroes for absent genera
    Spec.sum <- as.data.table(complete(Spec.sum, sampling, GENUS,  fill = list(count = 0)))
    
    #Split that sampling col back into original columns
    Spec.sum <- Spec.sum[, c('YEAR', 'PLOT', 'SUBSITE', 'TRTMT') := tstrsplit(sampling, '_', keep = c(1,2,3,4))]
    
    #Remove the sampling column (if you want)
    Spec.sum[, sampling := NULL]

#Reorder the data.table
Spec.sum.tidy <- Spec.sum[, c('PLOT', 'SUBSITE', 'TRTMT', 'GENUS', 'YEAR', 'count')]

As mentioned, you need to [make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Please do not use screen captures. Instead if you use `dput(head(vegdat, 20))` and then edit your post and include the result, that would be helpful. It also will be helpful if you could show what your final data.frame would look like after building a frequency table based on your example data. You probably can use `complete` to fill in missing desired combinations. — Ben, Dec 18 '21 at 16:22
Hi thanks for your responses! A friend of mine came up with a solution to my problem before I had a change to make a reprex. I have edited my question to include the solution incase anyone is having a similar issue. — ajohnson315, Jan 06 '22 at 21:18

Frequency table that includes zero counts but only for variable combinations that exist within my data frame

0 Answers0