2

The Google Play app store recently rolled out a new feature for the Android app, which groups common phrases that reviewers left. I was thinking about which algorithm might accomplish this with as little space and time as possible, but haven't yet come up with a satisfying answer.

The feature looks something like this:

This app has 10.000.000 reviews

"Addictive" : 1234 reviews

  • "This is addictive!"
  • "Can't put this addictive app down."
  • ...

"Bad framerate": 12345 reviews

  • "This app has a bad framerate on my device"
  • "The main menu has a bad framerate"
  • ...

The top 5 phrases are displayed (k=5). A phrase can have 5 words max (l_max=5). A popular app can have 10 million comments with 100 words each (n=10E9 words).

Trivial approaches

Hashmap

A simple algorithm for l_max=1 would be:

  1. Drop all words into a hashmap, counting their frequency (O(n) time, O(n) space)
  2. Select k top phrases O(nlogk) via heap, O(k) space

If l_max > 1, step 1 would create a hashmap that needs *O(n/l_max) space, which is too much.

Nussinov

The Nussinov algorithm (see Algorithm to find the most common substrings in a string) would be a great match, but it needs O(n**2) space, so we probably shouldn't use that one here.

Community
  • 1
  • 1
Johannes
  • 23
  • 3

0 Answers0