0

What is the time complexity for the following?

word = "abbaa"
sortedWord = sorted(sorted(word), key = word.count)
Samp
  • 11
  • 2
    It's O(n^2), due to calling the key function `word.count` for each letter in `word`. Counting the frequency of a letter is O(n) and you do it n times. If you used `collections.Counter` first to construct a frequency map, then used `key=frequencies.get` in the sort, the complexity would be O(n log n) because the actual sorting would dominate. – kaya3 Feb 01 '20 at 02:53
  • 1
    @kaya3: `collections.Counter(sorted(word)).most_common()` would be the most straight-forward way, since it handles the frequency sorting for you (no need to muck about with `key` functions yourself. It would sort from highest count to lowest, but you could always call `.reverse()` on the result to undo that as a `O(1)` step (or iterate over the result without reordering after wrapping in `reversed` for an essentially free operation). – ShadowRanger Feb 01 '20 at 03:04
  • @ShadowRanger That doesn't do quite the same thing - it sorts by frequency in descending order, and it returns `(letter, frequency)` pairs for distinct letters, which is not the same format as what `sorted` returns. You could use a list comprehension to transform it into the expected format, but then it's not so straightforward. – kaya3 Feb 01 '20 at 03:10
  • @kaya3: True. I suspect most uses would in fact care about exact counts, not just relative counts, but yes, not always what you want. – ShadowRanger Feb 01 '20 at 03:13
  • Regardless of the answer, you do not the inner `sorted`. – DYZ Feb 01 '20 at 03:20
  • 3
    @DYZ The inner `sorted` is to sort alphabetically before sorting by frequency. Python's `sorted` function is a stable sort, so it does make a difference; letters with the same frequency will remain in their original relative order from the first `sorted` call. – kaya3 Feb 01 '20 at 03:21
  • @kaya3 do you know if python caches the results of the key function? If not it seems possible to perform word.count more than n times, since sorting itself is O(n logn) -- not that that's a huge difference. – Mark Feb 01 '20 at 03:22
  • 1
    @MarkMeyer Yes, the key function is called once per list element. The docs don't specify that, but the docs also don't specify that the time complexity of the actual sorting is O(n log n). – kaya3 Feb 01 '20 at 03:25
  • @kaya3I was under the impression python still used timsort -- although maybe that's not officially documented.. – Mark Feb 01 '20 at 03:29
  • 2
    @MarkMeyer It does use Timsort, but the docs don't specify that it uses Timsort. Re: caching the keys, the `list.sort` method actually [is specified as doing this](https://docs.python.org/3/library/stdtypes.html#list.sort), and "[the intention of the manual is to guarantee](https://stackoverflow.com/a/1915418/12299000)" that `list.sort` and `sorted` use the same algorithm; so in this case it's just that the information is missing from the docs, rather than deliberately unspecified. – kaya3 Feb 01 '20 at 03:33

0 Answers0