I'm looking for a fast and scalable solution to coerce a massive data.frame from a long format to an edgelist in R.
Consider the following data.frame:
df1 <- data.frame(ID=c("A1", "A1", "A1", "B1", "B1", "B1"),
score=c(3,4,5,3,6,5))
> df1
ID score
1 A1 3
2 A1 4
3 A1 5
4 B1 3
5 B1 6
6 B1 5
The outcome should look like this. Note that the elements in score
become nodes that are linked with ties if they are held by the same ID
.
> el
X Y
1 3 4
2 3 5
3 4 5
4 3 6
5 6 5
The original df1
has roughly 30 million observations from which an edgelist needs to be calculated frequently.