For a homework assignment, we're to be turning the basicCompare
method into something that will compare two text documents and see if they're about similar topics. Basically, the program will strip out all of the words less than five characters in length, and it leaves us with lists. We're supposed to compare the lists, and make it so if the words are used enough between the two documents (let's say 80% similarity) the method returns true and says the "match."
However, I got stuck right about where all the comments are at the bottom of the method. I can't think of or find a way to compare the two lists and find out what percentage of the words are in both lists. Maybe I'm thinking about it wrong, and need to filter out words that aren't in both lists and then just count how many words are left. The parameters for defining whether or not the input documents match are left entirely up to us, so those can be set however I want. If you kind ladies and gentlemen could just point me in the right direction, even to a Java doc page on a certain function, I'm sure I can get the rest of the way. I just need to know where to start.
import java.util.Collections;
import java.util.List;
public class MyComparator implements DocumentComparator {
public static void main(String args[]){
MyComparator mc = new MyComparator();
if(mc.basicCompare("C:\\Users\\Quinncuatro\\Desktop\\MatchLabJava\\LabCode\\match1.txt", "C:\\Users\\Quinncuatro\\Desktop\\MatchLabJava\\LabCode\\match2.txt")){
System.out.println("match1.txt and match2.txt are similar!");
} else {
System.out.println("match1.txt and match2.txt are NOT similar!");
}
}
//In the basicCompare method, since the bottom returns false, it results in the else statement in the calling above, saying they're not similar
//Need to implement a thing that if so many of the words are shared, it returns as true
public boolean basicCompare(String f1, String f2) {
List<String> wordsFromFirstArticle = LabUtils.getWordsFromFile(f1);
List<String> wordsFromSecondArticle = LabUtils.getWordsFromFile(f2);
Collections.sort(wordsFromFirstArticle);
Collections.sort(wordsFromSecondArticle);//sort list alphabetically
for(String word : wordsFromFirstArticle){
System.out.println(word);
}
for(String word2 : wordsFromSecondArticle){
System.out.println(word2);
}
//Find a way to use common_words to strip out the "noise" in the two lists, so you're ONLY left with unique words
//Get rid of words not in both lists, if above a certain number, return true
//If word1 = word2 more than 80%, return true
//Then just write more whatever.basicCompare modules to compare 2 to 3, 1 to 3, 1 to no, 2 to no, and 3 to no
//Once you get it working, you don't need to print the words, just say whether or not they "match"
return false;
}
public boolean mapCompare(String f1, String f2) {
return false;
}
}