(Other questions & answers elsewhere on this forum don't seem to deal with the cross-border issue mentioned in this feed)
Suppose I have the following data:
df <- data.frame(id=c("Eric", "John", "Sarah", "Simon", "Abdul", "Charlotte", "Alex", "Susan"),
state=c("CA", "AK", "NY", "NY", "NJ", "GA", "CA", "CA"),
project=c(1, 2, 2, 2, 3, 4, 5, 5), stringsAsFactors = F)
> df
id state project
1 Eric CA 1
2 John AK 2
3 Sarah NY 2
4 Simon NY 2
5 Abdul NJ 3
6 Charlotte GA 4
7 Alex CA 5
8 Susan CA 5
I want to get the average number of project members per state, also counting cross-border members.
To get the average of only in-state members, I did the following:
dfx <- data.frame()
dfy <- data.frame()
for(j in unique(df$state)){
h <- subset(df, state==j)
counts <- plyr::count(h, 'project')
#uniques <- length(unique(sub$invje))
average_members <- mean(counts$freq)
dfx <- data.frame(state=j,
average_members=average_members)
dfy <- rbind(dfy, dfx)
}
> dfy
state average_members
1 CA 1.5
2 AK 1.0
3 NY 2.0
4 NJ 1.0
5 GA 1.0
The desired output I'm after, both AK and NY should score 3 because each ID works with two other IDs on a project (despite living in a different state).
> desired
state average_members
1 CA 1.5
2 AK 3.0
3 NY 3.0
4 NJ 1.0
5 GA 1.0
Does anyone know how to code that?