Why is "print (*sum (_2DArray, []))" slower than "print(' '.join(j for i in _2DArray for j in i))"?

Question

Today, while I was practicing Problem solving on HackerRank The Full Counting Sort Problem, I found out that flattening a 2D array, unpacking it and printing as below:

print (*sum (_2DArray, []))

was causing a "Time Limit Exceeded" Error on submission but using regular nested loops was fine.

print(' '.join(j for i in _2DArray for j in i))

Why is this flattening and unpacking slower than O(n^2) of nested loops ?

Thanks in advance

Edit:- Full Solution for problem

def countSort(arr):

result = [[] for x in range(len(arr))]    
for i in range(len(arr)):
    result [int (arr[i][0])].append ('-' if i < len(arr)//2 else arr[i][1]) 
    
print(' '.join(j for i in result for j in i))     
# print (*sum (result, []))

Please read why `sum` is slow when used for joining pieces: https://stackoverflow.com/questions/39435401/can-i-join-lists-with-sum — VPfB, Jan 30 '22 at 08:58
@KellyBundy I was using a 2D array of String type for the question linked above. Dimensions were dependent on the HackerRank internal test cases. My flatten and unpack method was failing 3/7 test cases due to timeout while nested loops passed all 7 test cases — Victor Sankar Ghosh, Jan 30 '22 at 09:06
@VPfB It *can* be slow. Here the length of the outer list is probably relatively small, so I'm really not convinced that that's it. — Kelly Bundy, Jan 30 '22 at 09:06
What happens if you do `print (''.join(sum (_2DArray, [])))`? — Kelly Bundy, Jan 30 '22 at 09:07
And what is the maximum length of your outer list? (Showing your solution would help, btw...) — Kelly Bundy, Jan 30 '22 at 09:10
And conversely, what happens with `print(*(j for i in _2DArray for j in i))`? (I really wish you had shown your solution so I could simply try these things myself...) — Kelly Bundy, Jan 30 '22 at 09:14
@KellyBundy Same as before... Failing 3/7 test cases Unfortunately I cannot find out the what would be the maximum outer list length in the failing internal test cases. If it helps, I can edit my post to include my full solution. — Victor Sankar Ghosh, Jan 30 '22 at 09:18
@KellyBundy `print(*(j for i in result for j in i))` passes all test cases — Victor Sankar Ghosh, Jan 30 '22 at 09:23
They state limits, and since you know what your code does, you *can* tell how large that length can be. You made it `len(arr)` big. Unclear why you did that. What happens if you make it `100` big instead? — Kelly Bundy, Jan 30 '22 at 09:26
@KellyBundy By making it `100`, `print (*sum (result, []))` passes all test cases. Thank you for pointing out the error in my code. The length should be as large as the largest number in `arr` and not the length of arr — Victor Sankar Ghosh, Jan 30 '22 at 09:39

jthulhu · Answer 1 · 2022-01-30T09:52:22.437

If you denote n the number of atoms of your array, printing them with a loop is O(n), whereas flattening them is in O(n²). This is because adding Python lists is linear in the size of both lists.

Besides that, when you flatten the list, you actually store the intermediate results in the RAM, whereas the loop way is lazy-evaluated, meaning you only have O(1) memory consumption. Writing to memory is a slow operation, which might also contribute to the poor performances.

EDIT: Some details (that are lacking in my original answer).

When you print something, that something has to be stored somewhere (at least in a buffer). So when I talk about memory complexity, I do not take into account that aspect. To put it another way, the complexity I am talking about is the extra space required.
With atom, I meant the bottom elements. [["a", "b"], ["c", "d"]] has four atoms. Flattening a list of lists using sum has a complexity that is quadratic with respect to the number of atoms (in the case of a list of singletons, that high bound is reached). However, simply traversing the array has a complexity that is linear with respect to the number of atoms.
Don't let the join fool you into thinking you're doing something more than if you did not put it. Your last list is equivalent to

print(*(j for i in _2DArray for j in i))

Note that this last line really shows that the difference between the two ways to print the elements of the array is in the non-lazy flattening vs lazy traversal.

A note about merging python lists.

The theoretical problem

When you add two lists in python, you actually move (at least) on of the two lists, which simply means copying all of its elements. This operation requires writing to memory. The main problem of flattening (done this way) is that you keep copying some elements again and again. There is a way to make it linear, which is the following

result = []
for line in _2DArray:
  result.extend(line)

however, since it Python you have a very thin control over how memory is managed, you can't be sure (unless you deep into the specs, which you usually want to avoid) that it's how it is done. The other way that it could be done would be

result = []
for line in _2DArray:
  temp = result
  result = line[:]
  result.extend(temp)

This is clearly much worse. Well, when you simply sum a list of lists, you can't really tell which one is going to happen.

The practical problem

In any case, even if what was actually done was the linear-time solution, you still copy arrays a lot, which implies you write to memory more than if you simply did it with generators.

Does `join` not build a list, taking O(n) memory? What Python implementation is used there? — Kelly Bundy, Jan 30 '22 at 08:45
You mentioned "If you denote n the number of atoms of your array, printing them with a loop is O(n)", but if you see in my nested loop example above, I am accessing all elements in a 2D array, adding it to a 'join' string and finally printing joined string. Surely that would be O(n²) too since it is accessing i*j times of the 2D array. Also "(j for i in _2DArray for j in i)" is also creating an iterable object containing all elements in the 2D array. Even that too has to be stored in RAM before the join () is called. Wouldn't the size of objects in both cases be comparable ? — Victor Sankar Ghosh, Jan 30 '22 at 08:54
Also, I think their first way has one `write` per element whereas their second way has only one. That could be the real reason instead. Better show some benchmarks/profiling to see what actually takes most of the time. — Kelly Bundy, Jan 30 '22 at 08:54
I edited my post to explain the details, which were evidently lacking... — jthulhu, Jan 30 '22 at 09:55

score 0 · Answer 2 · answered Jan 30 '22 at 09:45

0

sum (_2DArray, []) is essentially equivalent to

l = []
for item in _2DArray:
    l = l + item

The computational cost of concatenating two lists of sizes n1 and n2 is O(n1 + n2)

Let us assume that our 2D array is a square matrix of size n * n. We can denote the size of l at each iteration to be n1 and the size of item to be n2. The number of steps of each loop can be determined as follows:

t = 1, n1 = 0, n2 = n ; The number of steps required to perform the concatenation is n
t = 2, n1 = n, n2 = n; The number of steps is n + n = 2n
t = 3, n1 = 2n, n2 = n; The number of steps is 2n + n = 3n

...

t = n, n1 = (n - 1)*n, n2 = n; The number of steps is n(n - 1) + n = n^2

The effective number of steps required to perform concatenation using sum is n + 2n + 3n + ... + n^2 = n * n * (n + 1) / 2 which yields a time complexity of O(n^3).

The concatenation approach is effectively slower than quadratic time. We also do not factor in the overhead for creation of n new lists which take at least O(n) and at most O(n^2) space. There's also the overhead of the print method having to treat each element of the array as an argument and iterate to print it.

Using ' '.join(j for i in _2DArray for j in i) essentially creates the resulting string in O(n^2) time by using an iterator over the list elements and also yields a single argument to pass to print.

answered Jan 30 '22 at 09:45

Rishi Rajasekaran

36
3

It's not square, have a look at their solution code and the last few comments under the question if you want to know more. – Kelly Bundy Jan 30 '22 at 09:49
As explained in my answer, your `O(n^3)` bound is optimistic, in the general case. With your notations, the worst case could be `O(n^4)`. – jthulhu Jan 30 '22 at 09:54
Furthermore, my point is that using `sum` to perform the flattening requires a time complexity that is proportional to at least the number of elements in the 2D array, and can have worse time complexity depending on how the array is laid out. I also describe other problems with the described approach - list creation, print etc. – Rishi Rajasekaran Jan 30 '22 at 10:10
@BlackBeans could you explain how one can get `O(n^4)`? For a 2D array of `n^2` elements, I cannot come up with a time complexity worse than `O(n^3)`. – Rishi Rajasekaran Jan 30 '22 at 10:27
@RishiRajasekaran The problem with your way of counting items is that it (implicitly) implies that your array is `n` lists, each with `n` items. However, you can also have `n²` lists, each with 1 element (and still have the same number of total elements). If you denote `m=n²`, the complexity could be `1 + 2 + ... + m = m(m+1)/2 = O(m²) = O(n⁴)`. This confusion is also present in the question. – jthulhu Jan 31 '22 at 18:18

Kelly Bundy · Accepted Answer · 2022-01-30T10:33:30.597

The way you wrote it, in the worst case your 2D list has a million strings in the first row and then 999,999 empty rows. The sum then copies the million string references in every addition, i e., a million times. That's a trillion string reference copies. And all the time, the strings' reference counters get increased and decreased. A looooot of work. Your other method doesn't have that issue. In short: They're O(n²) vs O(n), where n, as defined in the problem, can be as large as a million.

And as discussed, the outer length never needs to be n but at most 100 (more precisely the maximum x + 1), and with that, the sum method gets accepted. Still does up to 100 million reference copies, but such copying is low-level and relatively fast compared to Python code.

Why is "print (*sum (_2DArray, []))" slower than "print(' '.join(j for i in _2DArray for j in i))"?

3 Answers3

A note about merging python lists.

The theoretical problem

The practical problem