0

I have a large corpus and I would like to create a correlation matrix for all the terms in the entire corpus. I can find correlations for any given word in the corpus using the following code:

      findAssocs(corp_dtm, terms = "serachword", corlimit = 0.01)

But I would like to plot this data, using the correlations as weights, so I need a matrix with all the correlations. Is there an easy way to do this?

     hello   world   my     name     is     liam

hello   1      .3     .04    .21     .88    .00

world   .3     1

my      .04            1

name    .21                   1

is      .88                           1

liam    .00                                    1

Like this, but all filled in.

Thanks!

lwe
  • 323
  • 1
  • 8
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Nov 25 '19 at 17:07

1 Answers1

1

As far as I know, there are no correlation functions for sparse matrices. So you need to transform the sparse matrix into a normal matrix first. See line of code below.

But I advice against this as this will first create a dense matrix and you have a good chance of running into memory issues if your document term matrix is even slightly large. And you indicated this is the case.

cor_matrix <- cor(as.matrix(corp_dtm))
phiver
  • 23,048
  • 14
  • 44
  • 56
  • Okay. Thank you. I did actually manage to come up with code like this, but it just kept running and R would stop working, so I thought I had done something wrong, but it's just too big of a matrix. Thanks. – lwe Nov 25 '19 at 18:09