I have a dataframe like so:
df<- data.frame(region = c("1","1","1","1","1","2","3","3","3"),
loc = c("104","104","104","105","105","106","107", "108", "109"),
interact = c("A_B","A_B", "B_C", "C_D", "A_B", "E_F", "E_F", "F_G", "A_B"))
I would like to make a dataframe that:
1) counts the incidence frequency of a given interaction occurring among loc
levels for each region
subset. Therefore, in the example above, in region 1 there are two loc
(104 and 105) that both have the interact A_B
. Thus, the incidence frequency of A_B
for region 1 = 2. Duplicate interact
levels in the same loc
are not counted. So while A_B occurs 3 times in region 1, it occurs only in two unique loc
. The incidence frequency counts how many unique loc
level this interact
occurs in.
2) The new dataframe should vectorize all possible interact
levels among all regions, and count incidences of these for each region. As a consequence, 0's should be included for all levels of interact that did not occur in that region.
3) The first row needs to be a count of unique loc
levels in that region. In region1 there were 2 loc levels(104,105), region2 1 loc level(106) and in region 3, 3 loc levels(107-109).
The final output will look like:
output<- data.frame(interact = c("","A_B","B_C","C_D","E_F","F_G"),
region1 = c("2","2","1","0","1","0"),
region2 = c("1","0","0","0","1","0"),
region3 = c("3","1","0","0","1","1"))
I do not know where to start with this, but here is what I have adapted from @akrun in a similar question posted on Convert from long to wide format counting frequency of eliminated factor level (Prepping dataframe for input into iNEXT Online), but get errors with:
library(tidyverse)
df %>%
group_by(region = paste0('region', region)) %>%
summarise(interact = "", V1 = n_distinct(loc)) %>%
spread(region, V1),
df %>%
group_by(region = paste0('region', region) & loc),
interact = as.character(interact)) %>%
summarise(V1 = length(unique((interact)) %>%
spread(region, V1, fill = 0))