I have a specific R dataframe question, which i haven't found a solution for.
I have a data frame with the same information (duplicates or Id1/Id2 combinations) that looks as follows:
df_test <- data.frame(Id1= c("1", "1", "2", "3", "11", "5", "6", "7"),
Id2= c("4", "4", "6", "11", "3", "6", "2", "9" ))
df_test
Id1 Id2
1 1 4
2 1 4
3 2 6
4 3 11
5 11 3
6 5 6
7 6 2
8 7 9
What I would like to achieve is a data frame counting all the duplicate and same information rows in a separate column, as well as only keeping one row each. The challenge is to not 'only' recognize duplicates, but also the same Id1/Id2 combinations. Something like below:
df_sol
Id1 Id2 Sum
1 1 4 2
2 2 6 2
3 3 11 2
4 5 6 1
5 7 9 1
I have tried aggregate
and summarise
, but there i couldn't find a solution yet. Another thought was to use the igraph
package and treat it like an edgelist
then edge.attr.comb = list(weight= "sum")
?
Thanks in advance
#dataframe for solution:
df_sol <- data.frame(Id1= c("1", "2", "3", "5", "7"),
Id2= c("4", "6", "11", "6", "9" ),
Sum =c("2", "2", "2", "1", "1"))