Convert a term-document matrix to node/edge list in R

Question

I've a term-document sparse matrix made iusing the tm package in R

I can convert to a term-term matrix using this snippet of code:

library("tm")
data(crude)
couple.of.words <- c("embargo", "energy", "oil", "environment", "estimate")
tdm <- TermDocumentMatrix(crude, control = list(dictionary = couple.of.words))    
tdm.matrix <- as.matrix(tdm)
tdm.matrix[tdm.matrix>=1] <- 1
tdm.matrix <- tdm.matrix %*% t(tdm.matrix)

but it's not what I really need, since I have to build a data frame suitable to be loaded in a network analysis tool like Gephi. This data frame should ideally have three columns:

{term1, term2, number of docs where term1 and term2 co-occur}

For example (not from the real data provided in the example above) if the word "embargo" and "energy" co-occur in three documents (this can be seen in the tdm matrix, where each document fits a column), i have a row like that:

+-----------+-------------+------+
| term1     | term 2      | Freq |
+-----------+-------------+------+
| oil       | energy      |  3   |
+-----------+-------------+------+

how can I build this nodes/edges dataframe from the term-document or the term-term matrix?

Please supply a minimal [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so we can see the classes and structures of the objects involved. If you give sample data, also give desired ouput so we can test various strategies. — MrFlick, Sep 11 '14 at 13:03
Added some example code and put some emphasies on the desired output — Gabriele B, Sep 11 '14 at 13:52

score 3 · Accepted Answer · answered Sep 11 '14 at 14:05

3

Sounds like you can get what you need if you add one more line of code

desired <- as.data.frame(as.table(tdm.matrix))
head(desired)

#         Terms Terms.1 Freq
# 1     embargo embargo    8
# 2      energy embargo    6
# 3 environment embargo    2
# 4    estimate embargo    4
# 5         oil embargo   44
# 6     embargo  energy    6

The as.table() really only changes the class. And it just so happens that there is an existing as.data.frame.table() method that flattens tables into frequency listings like you desire.

answered Sep 11 '14 at 14:05

MrFlick

195,160
17
277
295

it works perfectly; I'm just wondering if there is a easy way to get rid of permutations ie. the second and the sixth row in the above example: it's the same relation, actually, but reversed. Think this would help but not sure: http://stackoverflow.com/questions/14078507/remove-duplicated-2-columns-permutations – Gabriele B Sep 12 '14 at 09:41

Convert a term-document matrix to node/edge list in R

1 Answers1