I have been working on java to find the similarity between two documents. I prefer finding semantic similarity , but havent made efforts to find it yet . I am using the following approach .
- Extract terms / tokens (I am using JAWS with wordnet to remove synonyms thus improves the similarities )
- make a term document matrix
- LSA
- Cosine similarity
When i was looking at few stackoverflow pages , i got quite a few links to python implementations.
I would like to know if python is a better language to find the text similarity and would also like to know if i can find semantic similairty between two documents in python