5

I am attempting to find words associated with a particular word in a term document matrix using the tm package.

I am using findAssocs to do this. Arguments for findAssocs are:

  • x: A term-document matrix.
  • term: A character holding a term.
  • corlimit: A numeric for the lower correlation bound limit.

I am consistently getting numeric(0) as my result

Example:

findAssocs(test.dtm, "investment", 0.90)
>numeric(0)

Does anyone have familiarity with findAssocs and know what I am doing wrong? Or does anyone know more broadly what the numeric(0) result could mean?

Thank you very much in advance for any help.

costebk08
  • 1,299
  • 4
  • 17
  • 42
  • I'm sure if you'd provide a reproducible example you'd figure it out yourself. – David Arenburg Aug 08 '15 at 22:06
  • I'm not sure why this question is receiving a bounty, there is already a perfectly good answer : the threshold is too high, so no word is associated – scoa Aug 10 '15 at 19:14

4 Answers4

2

This result indicates that there are no words associated in 0.90 of documents with the term "investment". Try a lower threshold like 0.05 and work your way up to a threshold that yields fewer terms.

Timothy P. Jurka
  • 918
  • 1
  • 11
  • 21
2

I'm getting the same numeric(0), I think it's because there is only one document in my Corpus, so the document term matrix only have one column. You may want to test TermDocumentMatrix() and see if you have a multi-column matrix. That said, how do I find association within one document?.

costebk08
  • 1,299
  • 4
  • 17
  • 42
neghez
  • 715
  • 1
  • 8
  • 15
  • 1
    This isn't actually an answer. – Dason Oct 07 '12 at 01:15
  • @Dason, I agree, but it's a helpful clue. Faced with the same error message, I tested the claim, that `findAssocs` doesn't work when there's only one doc in the tdm, but works fine when there's more than one doc. – Ben Dec 09 '12 at 10:32
0

It does appear this functionality only works when analyzing multiple text documents. The only viable solution I have come up with is creating a duplicate of text document and then running the analysis. However, it is uncertain if this changes the results in any way. Any additional feedback would be appreciated.

costebk08
  • 1,299
  • 4
  • 17
  • 42
0

I think it also has to do with your data file. A text file should work but if it is a .csv with only one column, you will get the (0)

Lake
  • 1