3

Hi I am new to r I have a problem i.e to find the network of user(uID) and network of articles(faID) from a data frame called w2 like

faID      uID
 1        1256
 1        54789
 1        547821
 2        3258
 2        4521
 2        4528
 3        98745
 3        1256
 3        3258
 3        2145

this is just a example I have over 2000 articles what I want to make a relationship between users based on articles in a data frame format e.g. ##for article one##

1258  54789
1258  547821
54789 547821

##similarly for article 2##

3258  4521
3258  4528
4528  4521

some of the other information are

dput(head(w2,)) structure(list(faID=c(1L,1L,1L,1L,1L,1L),uID=c(20909L,6661L,1591L,28065L,42783L,3113L)), .Names=c("faID","uID"),row.names=c(7L,9L,10L,12L,14L,16L),class=data.frame")

dim(w2) 
[1] 364323 2

I am using the code advised by one of the volunteer

error appears at <<<>>"Error in UseMethod("regroup") :

no applicable method for 'regroup' applied to an object of class "c('integer', 'numeric')") ##

library(dplyr)
edges<-tbl_df(w2) %>% 
group_by(w2$faID) %>% 
do({    
tmp <-combn(sort(.$user),m =2)
data.frame(a=tmp[1,],b=tmp[2,],stringsAsFactors=FALSE )
 })%>%
 ungroup 
}

any suggestion will highly be appreciated.

akrun
  • 874,273
  • 37
  • 540
  • 662
Naveed Khan Wazir
  • 185
  • 2
  • 4
  • 15

1 Answers1

1

I guess this is not yet implemented in dplyr from reading Assigning names to the list output of dplyr do operation

You may do:

library(gsubfn)
library(dplyr)
w2%>% 
group_by(faID) %>%
fn$do2(~combn(.$uID, m=2)) #`do2` from the link

#    $`1`
#      [,1]   [,2]   [,3]
#[1,]  1256   1256  54789
#[2,] 54789 547821 547821

#   $`2`
#      [,1] [,2] [,3]
# [1,] 3258 3258 4521
#[2,] 4521 4528 4528

#  $`3`
#     [,1]  [,2]  [,3] [,4] [,5] [,6]
# [1,] 98745 98745 98745 1256 1256 3258
# [2,]  1256  3258  2145 3258 2145 2145

data

w2 <- structure(list(faID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L
), uID = c(1256L, 54789L, 547821L, 3258L, 4521L, 4528L, 98745L, 
1256L, 3258L, 2145L)), .Names = c("faID", "uID"), class = "data.frame", row.names = c(NA, 
-10L))

Update

It would be possible to do:

res <- w2 %>% 
group_by(faID) %>% 
do({data.frame(
     combN=paste(apply(combn(sort(.$uID), m=2),2,paste,collapse=" "),
    collapse=", "), stringsAsFactors=F)})

res
#   faID                                                               combN
# 1    1                               1256 54789, 1256 547821, 54789 547821
# 2    2                                     3258 4521, 3258 4528, 4521 4528
# 3    3 1256 2145, 1256 3258, 1256 98745, 2145 3258, 2145 98745, 3258 98745

library(data.table)

Use cSplit from https://gist.github.com/mrdwab/11380733

cSplit(cSplit(res, "combN", ", ", "long"),"combN", " ")
#     faID combN_1 combN_2
#  1:    1    1256   54789
#  2:    1    1256  547821
#  3:    1   54789  547821
#  4:    2    3258    4521
#  5:    2    3258    4528
#  6:    2    4521    4528
#  7:    3    1256    2145
#  8:    3    1256    3258
#  9:    3    1256   98745
# 10:    3    2145    3258
# 11:    3    2145   98745
# 12:    3    3258   98745
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I have copy paste the code in link and then apply the above code it gives me this error "error in eval(expr,envir,enclos) : object fn not found – Naveed Khan Wazir Jul 24 '14 at 08:16
  • @user3841811. If I don't load the `library(gsubfn)`, I will get the error you mentioned. – akrun Jul 24 '14 at 08:59
  • @user3841811. With the same dataset, I don't have any errors. If you can show the data using `dput` on a smaller dataset (10-20 rows) that shows the error, I will try. – akrun Jul 24 '14 at 12:53
  • actually the data above is just a subset of the whole data set some of the information are dput(head(w2,)) structure(list(faID = c(1L, 1L, 1L, 1L, 1L, 1L), uID = c(20909L, 6661L, 1591L, 28065L, 42783L, 3113L)), .Names = c("faID", "uID" ), row.names = c(7L, 9L, 10L, 12L, 14L, 16L), class = "data.frame") – Naveed Khan Wazir Jul 24 '14 at 13:16
  • @user3841811. Using my second approach. I am getting the result. res$combN [1] "1591 3113, 1591 6661, 1591 20909, 1591 28065, 1591 42783, 3113 6661, 3113 20909, 3113 28065, 3113 42783, 6661 20909, 6661 28065, 6661 42783, 20909 28065, 20909 42783, 28065 42783". Using gsubfn, also I get the results. `1` [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [1,] 20909 20909 20909 20909 20909 6661 6661 6661 6661 1591 1591 1591 [2,] 6661 1591 28065 42783 3113 1591 28065 42783 3113 28065 42.... – akrun Jul 24 '14 at 13:20
  • yes 2nd approach works but the groups(networked users) are made in a single cell where i want them to be in a separate cell both ID's for finding the centralities . – Naveed Khan Wazir Jul 24 '14 at 15:16
  • `user3841811`. I got both the methods working. I am using R version 3.1.0 (2014-04-10) Platform: x86_64-unknown-linux-gnu (64-bit);package versions: gsubfn_0.6-5: dplyr_0.2 – akrun Jul 24 '14 at 15:19
  • in your your 2nd method you can see the results are in one cell e.e for faid =1 1256and 54789 are in one cell similarly 1256 and 547821 are in one cell .any recommendations how to separate them in two different cells. – Naveed Khan Wazir Jul 24 '14 at 15:32
  • `user3841811`. Yes, as I mentioned before, and found in the comments from the link, there are some limitations at present (may be I am wrong).. – akrun Jul 24 '14 at 15:42
  • the approach to some extent is according to my requirement but only thing I want is to separate the user id's into two cells – Naveed Khan Wazir Jul 24 '14 at 16:22
  • @user3841811. I just updated the code. Now, the user ids' are separated into two columns. – akrun Jul 24 '14 at 16:44