I've got a documentTermMatrix that looks as follows:
artikel naam product personeel loon verlof
doc 1 1 1 2 1 0 0
doc 2 1 1 1 0 0 0
doc 3 0 0 1 1 2 1
doc 4 0 0 0 1 1 1
In the package tm
, it's possible to calculate the hamming distance between 2 documents. But now I want to cluster all the documents that have a hamming distance smaller than 3.
So here I would like that cluster 1 is document 1 and 2, and that cluster 2 is document 3 and 4. Is there a possibility to do that?