0

I only have one document (a survey compilation). I want to do word association within a single document with findAssocs. So far all the examples i have seen are all combination of a few documents.

inspect(myDtm)
A term-document matrix (864 terms, 1 documents)

Non-/sparse entries: 864/0 (what is this for?)
Sparsity           : 0% (what is this for? what does it mean if its 0%)
Maximal term length: 20 
Weighting          : term frequency (tf)

my data looks like this

unwanted               1
upgrade                3
valid                  1

this is my code and i end up with the results = numeric (0)

findAssocs(myDtm, "salary", 0.5)
numeric(0)

please help.

Ben
  • 41,615
  • 18
  • 132
  • 227
user2873496
  • 1
  • 1
  • 1

3 Answers3

1

Sparsity measures the percentage of elements (cf. cells) in the matrix that are equal to zero. When sparsity is high, you have a lot of terms that only occur in one or a few documents. You only have one document in your example, so all terms must occur in that doc. Very generally speaking a lower degree of sparsity is more useful for investigating document similarity (if that's what you're doing... it's not clear from your question).

The short answer is that your question has already been asked and answered: you need to have more than one doc in your dtm to calculate term associations using findAssocs.

You'll have to include a reproducible example if you want any more specific help with findAssocs. Try using the 'crude' dataset that comes with the tm package and experiment with findAssocs to see what happens when you alter the parameters. Check out the tm [documentation](http://cran.r- project.org/web/packages/tm/vignettes/tm.pdf) to see more about how to use the built-in data.

Here's an example using the built-in data, try it for yourself:

require(tm)
data(crude)
dtm <- DocumentTermMatrix(crude)

# one doc in dtm, doesn't work...
dtm1 <- dtm[1,]
findAssocs(dtm1, "oil", 0.01)

# ten docs, does work
dtm10 <- dtm[1:10,]
findAssocs(dtm10, "oil", 0.01)
Community
  • 1
  • 1
Ben
  • 41,615
  • 18
  • 132
  • 227
0

You can use findAssocs by adding your data in the following manner

data <- data.frame(text=txt, stringsAsFactors=FALSE)

tdm <- TermDocumentMatrix(Corpus(DataframeSource(data)))

Basically Import your data into a "Source", your "Source" into a "Corpus", and then make a TDM out of your "Corpus"

Nikolay Kostov
  • 16,433
  • 23
  • 85
  • 123
Saurabh Yadav
  • 365
  • 4
  • 13
0

I couple of years late. But I ran in to the same problem recently. It is because your term-document matrix (TDM) consists of only one document. Rather, your tdm should consist of multiple documents. If you use paste() to retrieve text from a data frame, you should not use paste(data$text, collapse = " "), but paste(data$text), before turning it into a TDM.

But if you present reproducable example maybe we can help.

FilipW
  • 1,412
  • 1
  • 13
  • 25