I am seeking to determine how many times specific people (ID
) collaborate within a predetermined time interval that precedes a given year. Ideally, the solution would be able to provide three things:
- The total number of collaborations per individual
- The total number of different individuals each person has collaborated with
- The total number of repeat collaborations in the existing team
A bunch of questions - 1, 2 ,3 ,4 - relate to this, but non seem to use two distinct criteria. I adapt the data example given here and solved by @trinker.
dat <- read.table(text="Group ID Time
Trx1 A 1980
Trx1 B 1980
Trx1 C 1980
Trx2 E 1980
Trx2 B 1980
Trx3 B 1981
Trx3 C 1981
Trx4 C 1983
Trx4 E 1983
Trx4 B 1983
Trx5 F 1984
Trx5 B 1984
Trx5 C 1984
Trx6 A 1986", header=T)
So in 1980, two projects occur Trx1
and Trx2
. In the first one, A, B, and C collaborate, in the second one, E and B do.
For every IDi
I want, the total number of IDj
(j <> i) with whom IDi
collaborates on a Trx
project within z (say 3) years before the focal project. Also, I want the number of different collaborators (e.g. 2 collaborations between A and B in the 3 year interval are only counted as 1). Finally, I need the repeat ties. For parsimony, all zero values can be excluded. My real data has over 40,000 ID, over 90,000 projects, and over 50 years.
The outcome should look like this:
ID year total diff repeat
B 1981 3 3 1
C 1981 2 2 1
B 1983 4 3 3 *
C 1983 3 2 2
E 1983 1 1 1
B 1984 3 2 2
C 1984 3 2 2 **
Explanation:
- Before 1983, B collaborates twice with C, once with A, and once with E (column total = 4). Only one collaboration with C matters for the fourth column (diff = 3). For the last column, I count the number of repeat collaborations with the current team members. B worked with C at Trx1 and Trx3 and with E at Trx2, thus repeat = 3
** C collaborated between 1981 and 1983 (3 year interval, exclude, 1980) with 3 people in total. 2 collaborations are with B (Trx3, Trx4), 1 is with E (Trx4), thus total = 3, diff = 2. Of the current team members (B and F), C had zero prior collaborations with F and 2 with B in the 3 year window (repeat = 2).
This question is quasi identical to another one I asked here, but I hope it is clearer here.