I am trying to apply a Java class for measuring cosine similarity between two documents with different length. The code for the class that is responsible to calculate this code is as following:
public class CosineSimilarityy {
public Double calculateCosineSimilarity(HashMap<String, Double> firstFeatures, HashMap<String, Double> secondFeatures) {
Double similarity = 0.0;
Double sum = 0.0; // the numerator of the cosine similarity
Double fnorm = 0.0; // the first part of the denominator of the cosine similarity
Double snorm = 0.0; // the second part of the denominator of the cosine similarity
Set<String> fkeys = firstFeatures.keySet();
Iterator<String> fit = fkeys.iterator();
while (fit.hasNext()) {
String featurename = fit.next();
boolean containKey = secondFeatures.containsKey(featurename);
if (containKey) {
sum = sum + firstFeatures.get(featurename) * secondFeatures.get(featurename);
}
}
fnorm = calculateNorm(firstFeatures);
snorm = calculateNorm(secondFeatures);
similarity = sum / (fnorm * snorm);
return similarity;
}
/**
* calculate the norm of one feature vector
*
* @param feature of one cluster
* @return
*/
public Double calculateNorm(HashMap<String, Double> feature) {
Double norm = 0.0;
Set<String> keys = feature.keySet();
Iterator<String> it = keys.iterator();
while (it.hasNext()) {
String featurename = it.next();
norm = norm + Math.pow(feature.get(featurename), 2);
}
return Math.sqrt(norm);
}
}
Then I construct an instance of this class, make two HashMap
and assign each document to these hasmaps. Then when I try to apply the calculation, if they are identical the result is 1.0 which is right but if there is any slight differences between them, no matter what, the result is always zero. What am I missing?
public static void main(String[] args) {
// TODO code application logic here
CosineSimilarityy test = new CosineSimilarityy();
HashMap<String, Double> hash = new HashMap<>();
HashMap<String, Double> hash2 = new HashMap<>();
hash.put("i am a book", 1.0);
hash2.put("you are a book", 2.0);
double result;
result = test.calculateCosineSimilarity(hash, hash2);
System.out.println(" this is the result: " + result);
}
The original code is taken from here.