0

I have two text files. The first user inputs a paragraph of text. The second is a dictionary of terms gotten from an owl file. Like so:

Inferior salivatory nucleus
Retrosplenial area
lateral agranular part

I have coded the bits to make these files. I am stuck as to compare the files so that any whole phrases that appear in the dictionary and the paragraph of text are printed out in the command line in Java.

spongebob
  • 8,370
  • 15
  • 50
  • 83

3 Answers3

1

Try following code, it will help you. Correct your file path in fileName and enter your search condition into the while loop:

public class JavaReadFile {
    public static void main(String[] args) throws IOException {
        String fileName = "filePath.txt";

        //read using BufferedReader, to read line by line
        readUsingBufferedReader(fileName);
    }

    private static void readUsingBufferedReader(String fileName) throws IOException {
        File file = new File(fileName);
        FileReader fr = new FileReader(file);
        BufferedReader br = new BufferedReader(fr);
        String line;
        while((line = br.readLine()) != null){
            //process the line
            System.out.println(line);
        }
        //close resources
        br.close();
        fr.close();
    }
}
Tom
  • 16,842
  • 17
  • 45
  • 54
anand kulkarni
  • 156
  • 1
  • 12
  • 1
    We've had Java 7 long enough to take advantage of its try-with-resources (i.e. `try (FileReader fr = new FileReader(file), BufferedReader br = new BufferedReader(fr)) {/*...*/}`). This eliminates the need for those ugly calls to `close()`. – bcsb1001 Apr 03 '15 at 10:44
  • 1
    @bcsb1001 Correct and this current usage of `#close()` is unsecure, because it might not be called in case of an exception. – Tom Apr 03 '15 at 10:46
0

You could write the file to a string and iterate over the keys in your dictionary and check if they are present in the paragraph with contains. This probably isn't a particularly efficient solution, but it should work.

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.HashSet;
import java.util.Set;

public class Test {

    public static void main(String[] args) throws IOException {
        String fileString = new String(Files.readAllBytes(Paths.get("dictionary.txt")),StandardCharsets.UTF_8);

        Set<String> set = new HashSet<String>();
        set.add("ZYMURGIES");

        for (String term : set) {
            if(fileString.contains(term)) {
                System.out.println(term);
            }
        }
    }
}
Community
  • 1
  • 1
Sesame
  • 183
  • 1
  • 9
  • Would this work on a larger scale if the input whole documents up to 20,000 words? – James Hunt Apr 03 '15 at 10:15
  • I put some code in, try it out on your data and see if it works. – Sesame Apr 03 '15 at 10:44
  • Thanks for your help so far. The only problem is that terms are only returned if its just the term from the dictionary. I want it so that the term from the dictionary is returned even when its in the middle of a big paragraph. Any idea on how to fix this. Thanks – James Hunt Apr 03 '15 at 10:57
  • I don't understand. The set has the terms from the dictionary right? What do you want to output? I thought just the term. This should output the terms if they appear in the fileString. Does order matter? – Sesame Apr 03 '15 at 11:01
  • Yes the set has the dictionary terms, I want to output the dictionary terms. Order doesn't matter. The problem at the moment is that if the term is in the middle of other text it will not be returned. e.g if the term is: Retrosplenial area. And the input file was: The Retrosplenial area is found in the mouth. At the moment the term Retrosplenial area would not be returned. Hope this clears the problem up. Thanks for your help – James Hunt Apr 03 '15 at 11:10
  • The listed code doesn't have this problem. Try testing `System.out.println("The Retrosplenial area is found in the mouth.".contains("Retrosplenial area"));`. – Sesame Apr 03 '15 at 11:18
  • Yeah and the code I posted prints the term if that statement evaluates to true. You need to add your list of Strings to set (note how I added the word "ZYMURGIES"). Oh, is the problem that you want to see the statement several times if it appears in the paragraph several times? – Sesame Apr 03 '15 at 11:28
  • I edit the code above to be the code I currently am using. This code is returning the whole dictionary not the terms that match from it. AB.txt is the dictionary and annotate.txt is the input file. Thanks for the help – James Hunt Apr 03 '15 at 12:32
  • Your code is a really big mess! I think you should use the template I have there and make sure that you iterate over the dictionary file and add each string to the set individually. You can't just set your whole file to be in the set...you have to set each individual word. – Sesame Apr 03 '15 at 19:58
0

Here's a Java 8 version of the contains checking.

package insert.name.here;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class InsertNameHere {
    public static void main(String[] args) throws IOException {
        String paragraph = new String(Files.readAllBytes(Paths.get("<paragraph file path>")));
        Files.lines(Paths.get("<dictionary file path>"))
                .filter(paragraph::contains)
                .forEach(phrase -> System.out.printf("Paragraph contains %s", phrase));
    }
}
bcsb1001
  • 2,834
  • 3
  • 24
  • 35