I have been unable to find a solution to my query on Stack Overflow. This post is similar, but my dataset is slightly - and importantly - different (in that I have multiple measures of 'time' within my grouping variable).
I have observations of organisms at various sites, over time. The sites are further aggregated into larger areas, so I want to eventually have a function I can call in ddply to summarize the dataset for each of the time periods within the geographical areas. However, I'm having trouble getting the function I need.
Question
How do I cycle through time periods and compare with the previous time period, calculating the intersection (i.e. number of 'sites' occurring in both time periods) and the sum of the number occurring in each period?
Toy dataset:
time = c(1,1,1,1,2,2,2,3,3,3,3,3)
site = c("A","B","C","D","A","B","C","A","B","C","D","E")
df <- as.data.frame(cbind(time,site))
df$time = as.numeric(df$time)
My function
dist2 <- function(df){
for(i in unique(df$time))
{
intersection <- length(which(df[df$time==i,"site"] %in% df[df$time==i- 1,"site"]))
both <- length(unique(df[df$time==i,"site"])) + length(unique(df[df$time==i-1,"site"]))
}
return(as.data.frame(cbind(time,intersection,both)))
}
dist2(df)
What I get:
dist2(df) time intersection both 1 1 3 8 2 1 3 8 3 1 3 8 4 1 3 8 5 2 3 8 6 2 3 8 7 2 3 8 8 3 3 8 9 3 3 8 10 3 3 8 11 3 3 8 12 3 3 8
What I expect (hoped!) to achieve:
time intersection both
1 1 NA 4
2 2 3 7
3 3 3 8
Once I have a working function, I want to use it with ddply on the whole data set to calculate these value for each area.
Many thanks for any pointers, tips, advice!
I am running:
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)