I have 2 Document term matrices:
- DTM 1 has say 1000 vectors(1000 docs) and
- DTM2 has 20 vectors (20 docs)
So basically I want to compare each document of DTM1 against DTM2 and would want to see which DTM1 docs are closest to which DTM2 docs using the cosine function. Any pointers would help!
I have created a cosine matrix using the "slam" package.
Docs –glyma –ie –initi –stafford ‘bureaucratic’ ‘empti ‘holi ‘incontrovert
1 0.000000 0 0.000000 0.000000 0.000000 0 0 0
2 0.000000 0 0.000000 0.000000 0.000000 0 0 0
3 0.000000 0 0.000000 0.000000 0.000000 0 0 0
4 0.000000 0 0.000000 0.000000 0.000000 0 0 0
5 0.000000 0 0.000000 0.000000 0.000000 0 0 0
6 0.000000 0 0.000000 0.000000 4.906891 0 0 0
7 0.000000 0 0.000000 4.906891 0.000000 0 0 0
8 0.000000 0 0.000000 0.000000 0.000000 0 0 0
9 0.000000 0 4.906891 0.000000 0.000000 0 0 0
10 4.906891 0 0.000000 0.000000 0.000000 0 0 0
The cosine function results are:
However, this matrix compares the docs of DTM1 with one another. I want these vectors to be compared with the vectors of DTM2 and then find the closest DTM2 document for a given DTM1 document.