0

Hi I am new to r I have a problem i.e to find the network of user(uID) and network of articles(faID) from a data frame called w2 like

 faID      uID
  1        1256
  1        54789
  1        547821
  2        3258
  2        4521
  2        4528
  3        98745
  3        1256
  3        3258
  3        2145

this is just a example I have over 20000 articles what I want to make a relationship between users based on articles in a data frame format e.g.

**##for article one##**


1258  54789
1258  547821
47789 547821 

**##similarly for article 2##**

3258  4521
3258  4528
4528  4521

I was using the sparse matrix format but r memory do not allow me to find the network and centrality score of a user and article.any help would be highly appreciated.some of the other information are dput(head(w2,)) structure(list(faID = c(1L, 1L, 1L, 1L, 1L, 1L), uID = c(20909L,6661L, 1591L, 28065L,42783L, 3113L)), .Names = c("faID", "uID"), row.names=c(7L,9L,10L,12L,14L,16L), class =data.frame")

dim(w2) [1] 364323 2

Naveed Khan Wazir
  • 185
  • 2
  • 4
  • 15
  • 1
    How is your network defined? What are the nodes? User? How do you define edges? Sharing articles? Are the edges weighted, e.g. by the number of shared articles? You could have a look at the `igraph` package, but you have to be more precise in advance on what you want to map. – Beasterfield Jul 23 '14 at 08:23
  • nodes are in one case is users ,and in another is articles .edges are (for user) sharing articles and (for articles) user .yes weighted and non weighted edges – Naveed Khan Wazir Jul 23 '14 at 08:41
  • 1
    Maybe `igraph` package may come in handy? – David Arenburg Jul 23 '14 at 08:49
  • 1
    @user3841811 so you are trying to build a bipartite graph? or are you trying to build two different networks: one from the article and one from the user perspective? – Beasterfield Jul 23 '14 at 09:06
  • @Beasterfield yes one for user and one for article – Naveed Khan Wazir Jul 23 '14 at 12:33

1 Answers1

1

Here is one answer (among many possible solutions) to the question how to construct a data.frame for the adjacencies

user -- (article) -- user 

using dplyr:

library( dplyr )
edges <- tbl_df( tab ) %>% 
  group_by( article ) %>%
  do( {    
    tmp <- combn( sort(.$user),  m = 2 )
    data.frame( a = tmp[1,], b = tmp[2,], stringsAsFactors = FALSE )
  } ) %>%
  ungroup

which gives

Source: local data frame [12 x 3]

   article  a  b
1        1 u1 u2
2        1 u1 u3
3        1 u2 u3
4        2 u2 u4
...

If you want to summarise how many articles two users have in common you can do this by:

edges <- edges %>%
  group_by( a, b ) %>%
  summarise( article_in_common = length(article) ) %>%
  ungroup

Source: local data frame [6 x 3]

   a  b article_in_common
1 u1 u2                 1
2 u1 u3                 1
3 u1 u4                 1
4 u1 u6                 1
...

Note that this is possible, because you sorted the users prior to the call of combn.

From this data you can construct easily an igraph object:

library(igraph)
g <- graph.data.frame( select(edges, a, b, weight = article_in_common), directed = FALSE )

plot(g)

enter image description here

On this graph you cann call any kind of available centrality or community measures. See for instance ? centralize.scores.

Beasterfield
  • 7,023
  • 2
  • 38
  • 47
  • I am apllying this code edges<-tbl_df(w2)%>% ##w2 is my dataframe## + group_by(w2$faID)%>% ##faId is the article ## + do({ + tmp<-combn(sort(w2$uID),m=2) ## uID is the user id (variable name) + data.frame(a=tmp[1,],b=tmp[2,],stringasFactors=F) + })%>% + ungroup – Naveed Khan Wazir Jul 23 '14 at 13:46
  • 1
    @user3841811 then you have to provide your data, that we can see where the mistake originates from. The code as I provided it works with the data as it is provided in your example as shown above. You should start to copy/paste my code as I put it exactly, the code you showed in your comment contains a couple of mistakes. – Beasterfield Jul 23 '14 at 14:11
  • how can i provide my data over here – Naveed Khan Wazir Jul 23 '14 at 14:24
  • it should be a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), which means the minimal subset of your data which lets us reproduce your problem. A subset as small as this should be still of the size, that you can paste the `dput` here. But more likely is, that you find the mistake yourself while you strip the data down to this minimal example. – Beasterfield Jul 23 '14 at 14:28
  • dput(head(w2,)) structure(list(faID = c(1L, 1L, 1L, 1L, 1L, 1L), uID = c(20909L, 6661L, 1591L, 28065L, 42783L, 3113L)), .Names = c("faID", "uID" ), row.names = c(7L, 9L, 10L, 12L, 14L, 16L), class = "data.frame") – Naveed Khan Wazir Jul 23 '14 at 15:04
  • ,I need your help in finding the centrality of users on single article basis, I posted the question over here but didn't find any answer.( http://stackoverflow.com/questions/27918923/using-while-loop-for-finding-the-degree-centrality-score-of-a-user-in-a-project) – Naveed Khan Wazir Jan 14 '15 at 13:55