Count occurrences of each tag using functional programming

Question

I've been trying to make a function that returns a Map<String, Int> with the key being a certain tag and the value being the number of occurrences.

The object (simplified) from which I need to extract the info:

class Note {
   List<String> tags
}

The function so far:

private fun extractTags(notes: List<Note>): Map<String, Int> {
    return notes.map { note -> note.tags }
                .groupBy { it }
                .mapValues { it.value.count() }    
}

Right now the compiler gives me a return type mismatch of Map<(Mutable)Set<String!>!, Int> and I'm not certain I'm getting the desired result (as I still can't test this properly).

I'm expecting a result something in the lines of:

(tag1, 1)
(tag2, 4)
(tag3, 14)
...

holi-java · Accepted Answer · 2017-07-24T14:36:21.143

10

You can using Iterable#asSequence just like as Java-8 stream-api in Kotlin. then using Sequence#flatMap to merge all tags into a Sequence , and then using Sequence#groupingBy to counting each tag, for example:

private fun extractTags(notes: List<Note>): Map<String, Int> {
    return notes.asSequence()
                .flatMap { it.tags.asSequence() }
                .groupingBy { it }.eachCount()
}

Note: both Sequence#flatMap and Sequence#groupingBy are intermediate operations, which means if the terminal operation Grouping#eachCount is not called. all of the operations on the Sequence is not ran.

edited Jul 24 '17 at 14:36

answered Jul 24 '17 at 14:20

holi-java

29,655
7
72
83

Note: when use the stream-api you only need to transform the data source only once rather than many times, and it uses lower-memory than `Iterable#flatMap` since it will collect the original data source to another data source in each terminal operation. if you have many data & operations on the stream, you maybe get the `OutOfMemoryError`. e.g: a `Sequence` read a huge file lines by `BufferedReader.lines`. for more details you can see [@hotkey's answer](https://stackoverflow.com/a/35630670/4465208). – holi-java Jul 24 '17 at 17:10

score 6 · Answer 2 · answered Jul 24 '17 at 16:01

While the already accepted answer unarguably solves your problem, I feel like there's a bit of an "everything looks like a nail when you have a hammer" thing going on here.

The essence of that answer is that flatMap, groupingBy, and eachCount are the methods you need to solve your problem, however, using sequences here seems completely unnecessary.

Here's the code that just operates on/with regular collections:

private fun extractTags(notes: List<Note>): Map<String, Int> {
    return notes.flatMap { it.tags }
            .groupingBy { it }
            .eachCount()
}

I'd like to argue that this is a better solution than the one using sequences, because:

It produces the same results, since it uses the same operators.
The code is just simpler and easier to read without them.
The transformations here are simple and few, sequences get useful when you have long chains.
We are probably operating on relatively small data sets here. In my own quick measurements, the solution using sequences was about 10% faster when there are a million notes, but 17% slower when there are only ten thousand. I'll wager to guess you're closer to the latter in size of your lists. Sequences have overhead.
We aren't making use of the laziness provided by sequences at all, since we want to evaluate and return the results immediately.

You can see an excellent comparison of the two ways with pros and cons here as well for more details.

Les · Answer 3 · 2017-07-24T19:18:37.593

Here is your code modified to work. I changed map to flatMap. I also provided a version implemented as an extension function. Yours was failing because map> was producing a List<List<String>> where you were expecting List<String> (hence, flagMap).

fun extractTags(notes: List<Note>): Map<String, Int> {
    return notes.flatMap { it.tags } // results in List<String>
            .groupBy { it } // results in Pair<String, List<String>>
            .mapValues { it.value.count() }
}

fun Iterable<Note>.extractTags(): Map<String, Int> {
    return this.flatMap { it.tags } // results in List<String>
            .groupBy { it } // results in Pair<String, List<String>>
            .mapValues { it.value.count() }
}

And here is some code to test it with

import kotlin.collections.*

fun main(vararg args: String) : Unit {
    var notes = ArrayList<Note>()
    notes.add(Note(List<String>(1) { "tag1" }))
    notes.add(Note(List<String>(4) { "tag4" }))
    notes.add(Note(List<String>(14) { "tag14" }))

   for((first,second) in extractTags(notes))
       println("$first: $second")
   for((first,second) in notes.extractTags())
       println("$first: $second")
}

class Note {
    constructor(strings: List<String>) {
        tags = strings
    }
    var tags: List<String>
}

`groupingBy { it }.eachCount()` is more preferable in terms of memory consumption. — Ilya, Jul 24 '17 at 15:59
@Ilya - yes, you're right. I chose to use `groupBy` and `mapValues` because the OP used these and my main intent was to point to the specific problem with the original example. The accepted answer uses `groupingBy { it }.eachCount()`, I felt that would suffice. — Les, Jul 24 '17 at 19:30

score 0 · Answer 4 · answered Jul 02 '20 at 22:56

Excuse me for this late solution, but it is be the best one: as I think when you are using Kotlin you have the standard library that give you a better syntax, shorter and cleaner than the Java 8 streams.

private fun extractTags(notes: List<Note>): Map<String, Int> = notes.flatMap { it.tags }//list of String
        .groupBy { it }//list of Map.Entry<String,List<String>> //List<Map.Entry<String,List<String>>>
        .map {
            Pair(it.key, it.value.size)
        }//list of pairs(tag, count) // List<Pair(String,Int) 
       .toMap()//creat a map from the list of pairs

Count occurrences of each tag using functional programming

4 Answers4