How do I keep track of input words for both line and placement?

Question

In Java, I'm working on a program that reads a given text file and records words for the number of times they appear, and every spot in which they appear (in the format "lineNumber, wordNumber").

Though my methods for using the information are solid, I'm having trouble coming up with an algorithm that properly counts both the lines and the placements (beyond the words in the first line).

For example, if the text is

hello there
who are you hello

The word objects would be given the information

 hello appearances: 2 [1-1] [2-4]
 there appearances: 1 [1-2] 
 who appearances: 1 [2-1]
 are appearances: 1 [2-2]
 you appearances: 1 [2-3]

Here's a basic version of what I have:

   lineNumber = 0;
   wordNumber = 0;

   while (inputFile.hasNextLine())
   {
      lineNumber++;
      while (inputFile.hasNext())
      {
        wordNumber++;
        word = inputFile.next();
        //an algorithm to remove cases that aren't letters goes here

        Word w = new Word(word);
        w.setAppearance(lineNumber, wordNumber);
   }

But of course the problem with this approach is that the hasNext() conflicts with the hasNextLine() since HasNext() apparently goes to the next line in the text file automatically, so lineNumber doesn't get a chance to increment, so any word after line 1 gets incorrect recordings.

How could I fix this? If this is complex enough that I'd need another import, what should I use?

I'm using Word objects because my assignment requires the string, the number of appearances, and a reference to a linked list for a hash table; there's a bunch more stuff in this lab, but this is the small thing I still need to work on. — Rez, Aug 10 '15 at 05:35
well [my answer](http://stackoverflow.com/a/31912496/256196) does't need it, and it gets the job done. It's actually an anti-pattern to use a `Word` class - the word shouldn't know anything about its use; that's not within its scope of responsibility. It's the job of another class to store data about *how* the word was used. Consider: how would a `Word` class handle 20 different ways of analysing its use - it wouldn't scale. — Bohemian, Aug 10 '15 at 06:04

score 0 · Answer 1 · edited May 23 '17 at 12:14

0

You don't need 2 while statements. Grab the entire line and then use the String.split function to get words from the line (you split it by space character). Also, this might help for reading line by line.

edited May 23 '17 at 12:14

Community

1
1

answered Aug 10 '15 at 05:36

Ivan

297
1
6

I think the punctuation would get in the way. – Rez Aug 10 '15 at 05:43
Wait, I found a way to get rid of the punctuation. I'll keep going. – Rez Aug 10 '15 at 05:49
So how would I get every word in a line if the number of words in the line is variable? – Rez Aug 10 '15 at 05:57

Bohemian · Answer 2 · 2015-08-10T15:04:08.067

Firstly, no need for the outer while - delete it. Secondly, no need for the Word class - delete it.

Next, you need a structure that can store multiple values for each word. A suitable structure would be a Map<String, List<Map.Entry<Integer, Integer>>>.

This code does the whole job in a few lines:

Map<String, List<Map.Entry<Integer, Integer>>> map = new HashMap<>();

for (int lineNumber = 1; inputFile.hasNext(); lineNumber++) {
    int wordNumber = 0;
    for (String word : inputFile.next().split(" "))
        map.merge(word, new LinkedList<>(Arrays.asList(
            new AbstractMap.SimpleEntry<>(lineNumber, ++wordNumber))),
            (a, b) -> {a.addAll(b); return a;});
}

map.entrySet().stream().map(e -> String.format("%s appearances: %d %s",
    e.getKey(), e.getValue().size(), e.getValue().stream()
    .map(d -> String.format("[%d-%d]", d.getKey(),d.getValue())).collect(Collectors.joining(" "))))
    .forEach(System.out::println);

Here's some test code:

Scanner inputFile = new Scanner(new ByteArrayInputStream("foo bar baz foo foo\nbar foo bar\nfoo foo".getBytes()));
Map<String, List<Map.Entry<Integer, Integer>>> map = new HashMap<>();
for (int lineNumber = 1; inputFile.hasNext(); lineNumber++) {
    int wordNumber = 0;
    for (String word : inputFile.next().split(" "))
        map.merge(word, new LinkedList<>(Arrays.asList(
            new AbstractMap.SimpleEntry<>(lineNumber, ++wordNumber))),
            (a, b) -> {a.addAll(b); return a;});
}

map.entrySet().stream().map(e -> String.format("%s appearances: %d %s",
    e.getKey(), e.getValue().size(), e.getValue().stream()
    .map(d -> String.format("[%d-%d]", d.getKey(),d.getValue())).collect(Collectors.joining(" "))))
    .forEach(System.out::println);

Output:

bar appearances: 3 [2-1] [6-1] [8-1]
foo appearances: 6 [1-1] [4-1] [5-1] [7-1] [9-1] [10-1]
baz appearances: 1 [3-1]

I actually do need the Word class. I'm storing the Words into an object binary tree. — Rez, Aug 10 '15 at 06:04

How do I keep track of input words for both line and placement?

2 Answers2