How to find total count of Words, total count of Vowels, total count of Special Character in a text file using java 8

Question

I have a text file and i want to check
- total words count in file
- total vowels count in file
- total special character in file

By using Java 8 Streams.

i want output as a Map in a single iteration if possible i.e

{"totalWordCount":10,"totalVowelCount":10,"totalSpecialCharacter":10}

i tried below code

    Long wordCount=Files.lines(child).parallel().flatMap(line -> Arrays.stream(line.trim().split(" ")))
                            .map(word -> word.replaceAll("[^a-zA-Z]", "").toLowerCase().trim())
                            .filter(word -> !word.isEmpty())
                            .collect(Collectors.groupingBy(Function.identity(), Collectors.counting())).values().stream().reduce(0L, Long::sum)

but it is giving me only total word count, i am thinking if its possible to return a single map which contain output as above with all count.

Requirements-only questions are not usually received very well on this site. SO is for specific programming questions, on code which already exists. Please include your current solution if you have one. — Tim Biegeleisen, Mar 25 '19 at 10:24
And you have tried *what* and are stuck *where* because *something* does not work? — luk2302, Mar 25 '19 at 10:24
As someone who has already collected over 300 reputation on StackOverflow, I'm surprised you don't realize that a list of requests is not a question. Please include your solution and the problem you encountered in that solution. — RealSkeptic, Mar 25 '19 at 10:25
`.collect(Collectors.groupingBy(Function.identity(), Collectors.counting())) .values().stream().reduce(0L, Long::sum)` is quiet a long winded way to get `.distinct() .count()`… — Holger, Mar 25 '19 at 11:25
Have you coded it without streams yet? I doubt this can be done properly with streams- — luk2302, Mar 25 '19 at 11:55
No.....but its too lengthy if i will do it without streams..... can we have 3 separate solution for each problem using stream? — Nikhi K. Bansal, Mar 25 '19 at 12:05

score 2 · Accepted Answer · answered Mar 25 '19 at 14:43

If we only had to count special characters and vowels, we could use something like this:

Map<String,Long> result;
try(Stream<String> lines = Files.lines(path)) {
    result = lines
        .flatMap(Pattern.compile("\\s+")::splitAsStream)
        .flatMapToInt(String::chars)
        .filter(c -> !Character.isAlphabetic(c) || "aeiou".indexOf(c) >= 0)
        .mapToObj(c -> "aeiou".indexOf(c)>=0? "totalVowelCount": "totalSpecialCharacter")
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
}

First we flatten the stream of lines to a stream of words, then to a stream of characters, to group them by their type. This works smoothly as “special character” and “vowel” are mutual exclusive. In principle, the flattening to words could have been omitted if we just extend the filter to skip white-space characters, but here, it helps getting to a solution counting words.

Since words are a different kind of entity than characters, counting them in the same operation is not that straight-forward. One solution is to inject a pseudo character for each word and count it just like other characters at the end. Since all actual characters are positive, we can use -1 for that:

Map<String,Long> result;
try(Stream<String> lines = Files.lines(path)) {
    result = lines.flatMap(Pattern.compile("\\s+")::splitAsStream)
        .flatMapToInt(w -> IntStream.concat(IntStream.of(-1), w.chars()))
        .mapToObj(c -> c==-1? "totalWordCount": "aeiou".indexOf(c)>=0? "totalVowelCount":
                Character.isAlphabetic(c)? "totalAlphabetic": "totalSpecialCharacter")
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
}

This adds a "totalAlphabetic" category in addition to the others into the result map. If you do not want that, you can insert a .filter(cat -> !cat.equals("totalAlphabetic")) step between the mapToObj and collect steps. Or use a filter like in the first solution before the mapToObj step.

As an additional note, this solution does more work than necessary, because it splits the input into lines, which is not necessary as we can treat line breaks just like other white-space, i.e. as a word boundary. Starting with Java 9, we can use Scanner for the job:

Map<String,Long> result;
try(Scanner scanner = new Scanner(path)) {
    result = scanner.findAll("\\S+")
        .flatMapToInt(w -> IntStream.concat(IntStream.of(-1), w.group().chars()))
        .mapToObj(c -> c==-1? "totalWordCount": "aeiou".indexOf(c)>=0? "totalVowelCount":
                Character.isAlphabetic(c)? "totalAlphabetic": "totalSpecialCharacter")
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
}

This will split the input into words in the first place without treating line breaks specially. This answer contains a Java 8 compatible implementation of Scanner.findAll.

The solutions above consider every character which is neither white-space nor alphabetic as “special character”. If your definition of “special character” is different, it should not be too hard to adapt the solutions.

How to find total count of Words, total count of Vowels, total count of Special Character in a text file using java 8

1 Answers1