0

I have a dictionary of terms dictonery/AB.txt and a large text file dictonery/annotate.txt.

I want to know which dictionary terms in AB.txt are in the annotate.txt file.

Here is my code so far:

 String fileString = new String(Files.readAllBytes(Paths.get("dictonery/AB.txt")), StandardCharsets.UTF_8);

 Map<String, String> map = new HashMap<String, String>();

 String entireFileText = new Scanner(new File("dictonery/annotate.txt")).useDelimiter("\\A").next();

 map.put(fileString, "m");

 for (String key : map.keySet()) {
     if(fileString.contains(key)) {
         System.out.print(key);
     }
 }

At the moment the whole dictionery is returned. How can I get it to be the specific terms in the annotator.txt file?

OPK
  • 4,120
  • 6
  • 36
  • 66
  • `Scanner.next` method accepts regular expressions, have you tried that ? –  Apr 07 '15 at 18:08
  • 2
    You're just adding the whole file to the map. You need to break it up and add it. Why a map though? a map of what to what? Are you just trying to get a list of all unique words in the file? or are you trying to map words to their descriptions? – Ashley Frieze Apr 07 '15 at 18:11

2 Answers2

1

There's a few things that might help:

  • Since you don't need the values in your Map, I would use a Set (specifically HashSet).
  • Use Scanner.next() to read individual words instead of the entire file at once
  • Your check for fileString.contains(key) is pretty inefficient, and it will also return true for partial matches (if your dictionary has the word "do", it will also match "dog"). It will also print matching words multiple times.

Personally, I would create two sets, read both files the same way, and then calculate their intersection. If you want sorted output (probably not a requirement, but generally nice), you could make the Set that you iterate over a TreeSet.

Community
  • 1
  • 1
Brendan Long
  • 53,280
  • 21
  • 146
  • 188
0

You don't really need a map.

  1. Read in your annotate.txt as your fileString
  2. Read in your AB.txt file using a loop like this:

    File file = new File("data.txt");
    
    try {
        Scanner scanner = new Scanner(file);
        while (scanner.hasNextLine()) {
            String line = scanner.nextLine();
            // do something like fileString.contains(line) here
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
    
  3. In the while loop, check if the fileString contains the line (which should contain the token it just read from your file).

This assumes that you have a single token per line.

Troy
  • 146
  • 4
  • 1
    Why use `nextLine()` (and assume a particular format of the input) instead of `next()`, which works as long as each word is whitespace-separated? Also, why bother catching the exception if you're just going to print the stack trace? – Brendan Long Apr 07 '15 at 18:24
  • @Brendon - This is just meant to be a general example of using a Scanner. The OP did not provide a sample of their AB.txt file or annotate.txt file so I am trying to lead them in the right direction based on the information they provided. Based on their file format, they absolutely could code it like what you state. I originally was going to provide a link to a tutorial on Scanner but figured I would provide a basic example instead. – Troy Apr 07 '15 at 18:27