2

In Java, I'm working on a program that reads a given text file and records words for the number of times they appear, and every spot in which they appear (in the format "lineNumber, wordNumber").

Though my methods for using the information are solid, I'm having trouble coming up with an algorithm that properly counts both the lines and the placements (beyond the words in the first line).

For example, if the text is

hello there
who are you hello

The word objects would be given the information

 hello appearances: 2 [1-1] [2-4]
 there appearances: 1 [1-2] 
 who appearances: 1 [2-1]
 are appearances: 1 [2-2]
 you appearances: 1 [2-3]    

Here's a basic version of what I have:

   lineNumber = 0;
   wordNumber = 0;

   while (inputFile.hasNextLine())
   {
      lineNumber++;
      while (inputFile.hasNext())
      {
        wordNumber++;
        word = inputFile.next();
        //an algorithm to remove cases that aren't letters goes here

        Word w = new Word(word);
        w.setAppearance(lineNumber, wordNumber);
   }

But of course the problem with this approach is that the hasNext() conflicts with the hasNextLine() since HasNext() apparently goes to the next line in the text file automatically, so lineNumber doesn't get a chance to increment, so any word after line 1 gets incorrect recordings.

How could I fix this? If this is complex enough that I'd need another import, what should I use?

AddWeb Solution Pvt Ltd
  • 21,025
  • 5
  • 26
  • 57
Rez
  • 187
  • 12
  • 1
    What is a `Word`? Why not just use `String`? – Bohemian Aug 10 '15 at 05:29
  • I'm using Word objects because my assignment requires the string, the number of appearances, and a reference to a linked list for a hash table; there's a bunch more stuff in this lab, but this is the small thing I still need to work on. – Rez Aug 10 '15 at 05:35
  • well [my answer](http://stackoverflow.com/a/31912496/256196) does't need it, and it gets the job done. It's actually an anti-pattern to use a `Word` class - the word shouldn't know anything about its use; that's not within its scope of responsibility. It's the job of another class to store data about *how* the word was used. Consider: how would a `Word` class handle 20 different ways of analysing its use - it wouldn't scale. – Bohemian Aug 10 '15 at 06:04

2 Answers2

0

You don't need 2 while statements. Grab the entire line and then use the String.split function to get words from the line (you split it by space character). Also, this might help for reading line by line.

Community
  • 1
  • 1
Ivan
  • 297
  • 1
  • 6
0

Firstly, no need for the outer while - delete it. Secondly, no need for the Word class - delete it.

Next, you need a structure that can store multiple values for each word. A suitable structure would be a Map<String, List<Map.Entry<Integer, Integer>>>.

This code does the whole job in a few lines:

Map<String, List<Map.Entry<Integer, Integer>>> map = new HashMap<>();

for (int lineNumber = 1; inputFile.hasNext(); lineNumber++) {
    int wordNumber = 0;
    for (String word : inputFile.next().split(" "))
        map.merge(word, new LinkedList<>(Arrays.asList(
            new AbstractMap.SimpleEntry<>(lineNumber, ++wordNumber))),
            (a, b) -> {a.addAll(b); return a;});
}

map.entrySet().stream().map(e -> String.format("%s appearances: %d %s",
    e.getKey(), e.getValue().size(), e.getValue().stream()
    .map(d -> String.format("[%d-%d]", d.getKey(),d.getValue())).collect(Collectors.joining(" "))))
    .forEach(System.out::println);

Here's some test code:

Scanner inputFile = new Scanner(new ByteArrayInputStream("foo bar baz foo foo\nbar foo bar\nfoo foo".getBytes()));
Map<String, List<Map.Entry<Integer, Integer>>> map = new HashMap<>();
for (int lineNumber = 1; inputFile.hasNext(); lineNumber++) {
    int wordNumber = 0;
    for (String word : inputFile.next().split(" "))
        map.merge(word, new LinkedList<>(Arrays.asList(
            new AbstractMap.SimpleEntry<>(lineNumber, ++wordNumber))),
            (a, b) -> {a.addAll(b); return a;});
}

map.entrySet().stream().map(e -> String.format("%s appearances: %d %s",
    e.getKey(), e.getValue().size(), e.getValue().stream()
    .map(d -> String.format("[%d-%d]", d.getKey(),d.getValue())).collect(Collectors.joining(" "))))
    .forEach(System.out::println);

Output:

bar appearances: 3 [2-1] [6-1] [8-1]
foo appearances: 6 [1-1] [4-1] [5-1] [7-1] [9-1] [10-1]
baz appearances: 1 [3-1]
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • I actually do need the Word class. I'm storing the Words into an object binary tree. – Rez Aug 10 '15 at 06:04