3

I have a specific R dataframe question, which i haven't found a solution for.

I have a data frame with the same information (duplicates or Id1/Id2 combinations) that looks as follows:

df_test <- data.frame(Id1= c("1", "1", "2", "3", "11", "5", "6", "7"),
                        Id2= c("4", "4", "6", "11", "3", "6", "2", "9" ))
df_test

  Id1 Id2
1   1   4
2   1   4
3   2   6
4   3  11
5  11   3
6   5   6
7   6   2
8   7   9

What I would like to achieve is a data frame counting all the duplicate and same information rows in a separate column, as well as only keeping one row each. The challenge is to not 'only' recognize duplicates, but also the same Id1/Id2 combinations. Something like below:

df_sol
      Id1 Id2 Sum
    1   1   4   2
    2   2   6   2
    3   3  11   2
    4   5   6   1
    5   7   9   1

I have tried aggregate and summarise, but there i couldn't find a solution yet. Another thought was to use the igraph package and treat it like an edgelist then edge.attr.comb = list(weight= "sum") ?

Thanks in advance

   #dataframe for solution:  
   df_sol <- data.frame(Id1= c("1", "2", "3", "5", "7"),
                          Id2= c("4", "6", "11", "6", "9" ),
                          Sum =c("2", "2", "2", "1", "1"))
pogibas
  • 27,303
  • 19
  • 84
  • 117
julia_3010
  • 255
  • 1
  • 2
  • 11
  • Related: (1) [Sorting rows alphabetically](https://stackoverflow.com/questions/31752475/r-sorting-each-row-of-a-data-frame?noredirect=1&lq=1), or [Sorting each row of a data frame](https://stackoverflow.com/questions/31752475/r-sorting-each-row-of-a-data-frame?noredirect=1&lq=1). (2) [Count number of rows within each group](https://stackoverflow.com/questions/9809166/count-number-of-rows-within-each-group). – Henrik Aug 22 '17 at 17:08

1 Answers1

5

You can using this , Sort line 1st, the order is different from your output , cause , in you df_test data type are all character.(change it to numeric you will get the result you want )

A=data.frame(t(apply(df_test, 1, sort)))
A=dplyr::count(A,X1,X2)
names(A)=c('Id1','Id2','Sum')

# A tibble: 5 x 3
     Id1    Id2   Sum
  <fctr> <fctr> <int>
1      1      4     2
2     11      3     2
3      2      6     2
4      5      6     1
5      7      9     1
BENY
  • 317,841
  • 20
  • 164
  • 234