What would the Big O notation be for alphabetically sorting each element in a nested list comprehension

Question

# find all possible sorted substrings of s
substr = ["".join(sorted(s[i: j])) 
                for i in range(len(s)) 
                    for j in range(i + 1, len(s) + 1)]

I know the sorted() method is O(nlog(n)), and the finding of all possible substrings is O(n^2). However, the .join() is what is throwing me off. sorted() returns a list and .join() iterates over each element in the list to append it to a string. So it should be a linear operation.

Would my substring sorter therefore be O(n^2) for the nested loop * O(nlogn) for sorting each result * O(n) for joining? Ergo O(n^4logn)??

If so, would breaking up the operations make this more efficient? I had another implementation where I move the sorting of the substrings to a second list comprehension

substr = [s[i: j] for i in range(len(s)) 
                for j in range(i + 1, len(s) + 1)]

sortedSubstr = ["".join(sorted(s)) for s in substr]

This would make it the O(n^2) list comprehension first + O(n)*O(nlogn) for the second list comprehension

Making the overall program now O(n^2logn)

Please correct me if I am wrong. Thanks

You only join once, and the cost of the join is the sum of the lengths of its input. So why would you.*multiply* by O(n)? The cost of the join is the sum of the lengths of all substrings, plus the sum of the costs of creating each string to be joined. — rici, Dec 29 '21 at 17:49
Because each .join() is happening for each substring in the list. Its not doing it just once, it’s for each iteration of the comprehension — danielschnoll, Dec 29 '21 at 17:57
that doesn't change the sum of the length of the arguments, though. The sum is the same whether done all at once or divided into a bunch of calls — rici, Dec 29 '21 at 20:02

score 1 · Answer 1 · answered Dec 29 '21 at 17:59

For the first algorithm time complexity is O(n^3*log(n)), because after two loop you are not making a join for each atomic action in sorted. You separately sort and join. So O(n) + O(n*log(n)) = O(n*log(n)), which multiplied by O(n^2) (nested loops) gives us O(n^3*log(n)).

About the second algorithm.

Calculation of substr gives us O(n^3): same O(n^2) for nested loops multiplied by O(n) for slicing s.
Note that len(substr) is O(n^2) — for each (i, j) from nested loops.
Calculation of sortedSubstr gives us O(n^3*log(n)): for each element of substr (whose count is O(n^2)) we call sorted. Each element's len is O(n), so sorted(s) gives us O(n*log(n)). So, samely, O(n^2) * O(n*log(n)) = O(n^3*log(n)).
Calculation of substr (O(n^3)) plus calculation of sortedSubstr (O(n^3*log(n))) yields O(n^3*log(n)).

So the same O(n^3*log(n)) for the second.

Samwise · Answer 2 · 2021-12-29T17:59:09.710

0

In this expression:

"".join(sorted(s[i: j]))

the O(n) join is negligible, because you're doing it once, after you've done the O(nlogn) sort.

O(nlogn + n) = O((n+1)logn) = O(nlogn)

Doing that sort n^2 times gets you to O(n^3 logn), regardless of whether/when you do the join.

edited Dec 29 '21 at 17:59

answered Dec 29 '21 at 17:55

Samwise

68,105
3
30
44

Yes but the .join() occurs for each element in the list, as well as the O(nlogn) occurs for each iteration of the n^2 loop. – danielschnoll Dec 29 '21 at 17:59
1

Yes; I was just addressing the `join` specifically as not being relevant since that was the specific thing OP said they were hung up on. I.e. `join(sorted(...))` will always have the same time complexity as `sorted(...)`, because it's just adding an O(n) operation to an O(nlogn) operation. Doing that in an n^2 loop means overall it's `n^2 * nlogn`, or `n^3 logn`. – Samwise Dec 29 '21 at 17:59
What about the second solution I provided? Would that not overall just be O(nlogn) for the sort * O(n) iterating over the second list? O(n^2logn) > O(n^2) which the n^2 is from the first nested list comprehension – danielschnoll Dec 29 '21 at 18:02
1

No, because the second list isn't `n` items, it's `n^2` items. (Or technically it's `n^2/2` items or however the combinatorics works out, but when we're talking about big-O it's the same thing.) – Samwise Dec 29 '21 at 18:02
Ohhhh true!! I was thinking in terms of it being linear and not taking into account the fact that the list is now n^2 bigger – danielschnoll Dec 29 '21 at 18:03

score 0 · Accepted Answer · answered Dec 29 '21 at 18:12

For the sake of analysis, let's replace the comprehensions with their equivalent loops as described in the docs.

Your first approach becomes

substr = []
for i in range(len(s)): # O(n)
    for j in range(i + 1, len(s) + 1): # O(n)
        substr.append("".join(sorted(s[i: j]))) # O(n*logn)

# Overall O(n^3 * logn)

Note that substr.append("".join(sorted(s[i: j]))) is not multiplicative, rather it is a sequential combination of the following operations (no actual assignments happen)

temp = s[i:j] # O(j-i), which worst case we can take O(n)
sorted_temp = sorted(temp) # O(n*logn)
joined_temp = "".join(sorted_temp) # O(n)
substr.append(joined_temp) # amortized O(1)

# Overall O(n*logn)

Now in the second approach the code becomes

substr = []
for i in range(len(s)): # O(n)
    for j in range(i + 1, len(s) + 1): # O(n)
        substr.append(s[i:j]) # O(n) for the slice
# first loop: O(n^3)

sortedSubstr = []
for s in substr: # O(n^2), we have appended n^2 times in the first loop
    sortedSubstr.append("".join(sorted(s))) # O(n*logn)
# second loop: O(n^3 * logn)

# Overall O(n^3 * logn)

So as you can see, they should be the same time complexity. I almost missed substr length being n^2 rather than n, so that might be a pitfall.

References for time complexity of various operations:

What would the Big O notation be for alphabetically sorting each element in a nested list comprehension

3 Answers3