0

I am trying to code to read info from a text file, I need to find out how many times each word separated by white spaces occur. My code works and it counts all the words accurately, but im stuck with this one problem...I need to remove commas and other special characters, because it will make separate entries for the same word. for example language and language, will show as 2 different words and it will make 2 separate word counters for both language and language,.

    package wordcounter;


public class WordCounter {

   
  public static void main(String[] args) throws FileNotFoundException, IOException {

        Map map = new HashMap();

        try (BufferedReader br = new BufferedReader(new FileReader("languages.txt"))) {
            StringBuilder sb = new StringBuilder();
            StringBuffer content = new StringBuffer();
            String line = br.readLine();

            while (line != null) { 
                
                
                String[] words = line.split(" ");
                
                for (String word : words){
                    if(word == null || word.trim().equals("")){
                        continue;
                        
                    }                
                }
                
               
                for (int i = 0; i < words.length; i++) {
                    
                    if (map.get(words[i]) == null) {
                        map.put(words[i], 1);
                    } else {
                        int newValue = Integer.valueOf(String.valueOf(map.get(words[i])));
                        newValue++;
                        map.put(words[i], newValue);
                    }
                }
                sb.append(System.lineSeparator());
                line = br.readLine();
            }
        }
        Map<String, Integer> sorted = new TreeMap<String, Integer>(map);
        for (Object key : sorted.keySet()) {
            System.out.println(key + "\tCounts: " + map.get(key));
        }
    }
    
}

I know I need to use replace/replaceAll, i just cant figure out where or how. Any help would be appreciated.

input: there are languages, within other languages.
output: languages.  Counts: 5; languages,   Counts: 1
Hulk
  • 6,399
  • 1
  • 30
  • 52
Cyph3r
  • 1
  • Does this answer your question? [How to remove special characters from a string?](https://stackoverflow.com/questions/7552253/how-to-remove-special-characters-from-a-string) – peterulb Jun 28 '21 at 15:28

3 Answers3

0

You can trim programmatically (by explicitly looking to see if the last character is punctuation) or by using a regular expression with a pattern matcher (looking it up is homework).

You can also use those results programatically or with replace.

As to where, think of it this way: you have to prepare the words and groom them before collecting them. It would go into the for loop that looks at each word in words.

A simple bit (you can do more/better) would be:

if (word.endsWith ('.') || word.endsWith (',') || ... ) {
    word = word.subString (0,word.length-1);
}
Alan
  • 716
  • 5
  • 15
0

You need to check if word ends with comma if yes, remove last character using substring

Note: end index should be (word.length() - 1) not -2 reason is start index starts from 0 whereas end index starts from 1.

if (word.endsWith(",")) {
            word = word.substring(0, word.length() - 1);
        }
sanjeevRm
  • 1,541
  • 2
  • 13
  • 24
  • 1
    I didn't check the exact details of the end index, so thanks for that and I edited my answer. – Alan Jun 28 '21 at 15:56
0

There is an easier way to do this. Just split each line on your selected delimiters and ignore the blank strings.

String line = "this, this: this test, test, Test, Test test;";

The first ways is using streams (which you could also easily turn into a map)

String[] words = Arrays.stream(line.split("[,:;\\s]+"))
        .filter(str -> !str.isBlank()).toArray(String[]::new);

for (String word : words) {
    System.out.print(word.toLowerCase() + " " );
}
System.out.println();

The second way is forgoing streams


for (String word : line.split("[,:;\\s]+")) {
    if (!word.isBlank()) {
        System.out.print(word.toLowerCase() + " " );
    }
}

Both of the above print


this this this test test test test test 

Also notice I used toLowerCase since I would think that test and Test count as the same word.

Note, isBlank() and isEmpty() were introduce in Java 11. If you aren't running that level you can write your own isBlank(String str) method.

public static boolean isBlank(String s) {
        return s.trim().length() == 0;
}  

And just in case your interested, here is the map solution I mentioned above.

Map<String, Long> freq = Arrays.stream(line.split("[,:;\\s]+"))
        .filter(str -> !str.isBlank())
        .collect(Collectors.groupingBy(String::toLowerCase,
                Collectors.counting()));

freq.entrySet().forEach(System.out::println);

Prints

test=5
this=3
WJS
  • 36,363
  • 4
  • 24
  • 39