Is it correct to say that this algorithm is O(n+m)?

Question

Firstly I looked into the following questions:

O(N+M) time complexity
Comparing complexity of O(n+m) and O(max(n,m))
Big-O of These Nested Loops
Is this function O(N+M) or O(N*M)?
How to find time complexity of an algorithm

However, I'm still not 100% confident. That said, I have the following python example code:

adjList = [[4,7,8,9],[7,7,5,6],[1,4,3],[2,9],[2,1,7]]

for i in adjList:   
    for j in i:
        print "anything else"

I came to think this is an O(n+m) algorithm, here is my reasoning:

I have adjList which is a list of lists. The integers there are randomly picked for the sake of exemplification. This is actually an adjacency list where the vertex 1 is linked to the vertices 4, 7, 8 and 9 and so on.

I know that adjList[i][j] will return the j-th item from the i-th list. So adjList[0][2] is 8

The first for will loop through each of the i lists. If we have N lists this is an O(N).

The second for will loop through each of the j elements of a list. But in this case, j is not a fixed value, e.g., the first list has 4 elements (4,7,8,9) and the third one has 3 elements (1,4,3). So at the end the second for will loop M times, M being the sum of each of the different j values. So, M is the sum of elements of every list. Thus O(M)

In this example, the first for should loop 5 times and the second for should loop 16 times. A total of 21 loops. If I change the adjList to a single big list within a list like this:

adjList = [[4,7,8,9,7,7,5,6,1,4,3,2,9,2,1,7]]

It would still loop through the 16 elements in the second for plus 1 time for the first for.

Thereby I can say that the algorithm will loop N times plus M times. Where N is the number of lists in adjList and M is the sum of elements in each one of the lists inside adjList. So O(N+M)

So, where lies my doubt?
Anywhere I looked I've found examples of nested loops being O(N^2) or O(N*M). Even when people mentioned that they can be something else other than those I've found no example. I'm yet to find an example of O(N+M) nested loops. So I'm still in doubt if my reasoning is correct.

Part of me wonders if this is not a O(N*M) algorithm. But I wouldn't elaborate on this.
Thus my final questions remains: Is this reasoning correct and said algorithm is indeed O(N+M)? If not, care to show where my mistakes are?

Hello? Are you already not interested in this anymore? – Stefan Pochmann May 07 '17 at 23:04 — Stefan Pochmann, May 07 '17 at 23:04

Stephen C · Answer 1 · 2017-05-08T00:52:12.730

Your big mistake is that you have not clearly identified M and N what mean.

For example:

Visiting all cells in an N x M matrix is O(N*M).
If you flatten that matrix into a list with P cells, visiting is O(P).
However P == N*M in this context ... so O(M*N) and O(P) mean the same thing ... in this context.

Looking at your reasoning, you seem to have conflated (your) M with the analog of my P. (I say analog because rectangular and ragged arrays are not identical.)

So, M is the sum of elements of every list.

That's not how I have used M. More importantly, it is not how the various other references you have looked at are using M. Specifically the ones that talk about an N x M matrix or an N x avge(M) ragged array. Hence your confusion.

Note that your M and N are not independent variables / parameters. If you scale a problem in N, that implicitly changes the value of M.

Hint: when reasoning about complexity, one way to be sure you get the correct is to go back to basics. Work out the formulae for counting the operations performed, and reasons about them rather than attempting to reason about how the "big O" notation composes.

OP has defined `N` as the number of sublists and `M` as the total number of integers in the whole nested list. — Alex Hall, May 07 '17 at 22:54
Yes ... but that's different to how the other sources he has been looking at define M and N. And the OP's definition means that M and N are not independent variables. — Stephen C, May 08 '17 at 00:46

score 1 · Answer 2 · edited May 23 '17 at 12:02

You define N and M as follows:

Thereby I can say that the algorithm will loop N times plus M times. Where N is the number of lists in adjList and M is the sum of elements in each one of the lists inside adjList. So O(N+M)

By this definition, the algorithm is O(M)¹. To understand why N vanishes, you need to consider the relationship between N and M. Suppose you have two lists, and you want to look at every possible pair of items from the two lists. We'll keep it simple:

list1 = [1, 2, 3]
list2 = [4, 5]

So you want to look at all six of these pairs:

pairs = [(1, 4), (2, 4), (3, 4), (1, 5), (2, 5), (3, 5)]

That's a total of 3 * 2 = 6. Now generalize that; say list1 has N items and list2 has M items. The total number of pairs is N * M, and so this will be an O(N * M) algorithm.

Now suppose that instead of looking at each pair, you just want to look at each item that is in one or the other list. Now you're just looking at all the values that appear in a concatenation of the two lists:

items = [1, 2, 3, 4, 5]

That's a total of 3 + 2 = 5 items. Now generalize; you'll get N + M, and so this will be an O(N + M) algorithm.

Given that, we should expect your case and the above case to be identical, if your case is indeed O(N + M). In other words, we should expect your case to involve looking at all the items from two different lists. But look:

all_lists = [[4,7,8,9],[7,7,5,6],[1,4,3],[2,9],[2,1,7]]

That's the same thing as:

list1 = [4,7,8,9]
list2 = [7,7,5,6]
list3 = [1,4,3]
list4 = [2,9]
list5 = [2,1,7]

Whereas in the O(N + M) case, there were only two lists, here, there are five lists! So this can't be O(N + M).

However, this should give you an idea of how to work out a better description. (Hint: it could include J, K, and L, in addition to M and N.)

The origin of your mistake is that in the first two examples, M and N are defined to be separate from one another, but your definitions of M and N overlap. In order for M and N to be summed or multiplied meaningfully, they need to be unrelated to one another. But in your definitions, the values of M and N are interrelated, and so in a sense, they repeat values that shouldn't be repeated.

Or, to put it in yet another way, suppose the sum of the lengths of all the inner lists adds up to M. If you have to take two steps instead of just one for each of those values, the result is still only a constant value C times M. And for any constant value C, C * O(M) is still O(M). So the work you do in the outer loop is already counted (up to a constant multiplier) by the O(M) term.

_Notes:

_{1. Well, ok, not quite, as Stefan Pochmann points out. Because of a technicality, it might be more appropriate to say O(max(N,
M)) because if any inner lists are empty, you'll still have to visit them.}

Alex Hall · Answer 3 · 2017-05-07T23:32:54.233

0

If your code looked like this:

for i in adjList:
    <something significant 1>   
    for j in i:
        <something significant 2>

Then I would agree with you. However <something significant 1> is missing (the internal work python does to execute the loop is not worth considering) and so there is no reason to include the O(N) part. Big O notation is not for counting every single thing, it's for showing how the algorithm scales as inputs get bigger. So we look at what matters, and that means your code should be considered O(M).

The reason nested loops are usually considered O(N*M) is because usually N is defined as the number of iterations of the outer loop (as you've done) and M is defined as the number of iterations of the inner loop per outer iteration, not in total. Therefore N*M by the common definition is equal to M in your definition.

EDIT: some are claiming that the time to loop should be considered, considering for example the case of a large number of empty lists. As the code below shows, it takes significantly longer just to construct such a nested list than to nested loop through it. And that's for a trivial construction, usually it would be more complicated. Therefore I do not believe it is worth considering the time to loop.

from time import time

start = time()

L = [[] for _ in range(10000000)]

construction_time = time() - start
start = time()

for sub in L:
    for i in sub:
        pass

loop_time = time() - start

print(construction_time / loop_time)  # typically between 3 and 4

edited May 07 '17 at 23:32

answered May 07 '17 at 22:53

Alex Hall

34,833
5
57
89

1

It's not O(M). Think about M being 0, i.e., all sublists being empty. – Stefan Pochmann May 07 '17 at 23:03
@StefanPochmann I'm not including the work of merely executing a loop. It sounds like you are. – Alex Hall May 07 '17 at 23:06
@StefanPochmann I would expect that under some models of computation you could. But I think your point is valid, and might explain in part why this is a confusing problem. – senderle May 07 '17 at 23:12
1

@StefanPochmann one doesn't typically factor every statement and expression into calculating the runtime. You look at the most expensive parts. In this case that's `print`. It's an approximation which is necessary for practicality. Otherwise you will exhaust the letters of the alphabet finding the runtime of any reasonably complex algorithm. – Alex Hall May 07 '17 at 23:16
@AlexHall If M is 0 and N is a trillion, then the `print` is **not** the most expensive part. It's not even run once. It costs zero time. But the code does go through a trillion lists, which does cost time. Quite a bit. You can't just ignore that. – Stefan Pochmann May 07 '17 at 23:26
@AlexHall Not sure what that distraction is supposed to show. If anything, you just acknowledged that empty lists do take time. – Stefan Pochmann May 07 '17 at 23:40
@StefanPochmann if I loop through a matrix with M rows (lists) and N columns, basically everyone will say the runtime is `O(M*N)`. No one ever says that it's `O(M+M*N)` just in case N is 0. – Alex Hall May 07 '17 at 23:44
@AlexHall Yeah, I'd say O(MN) for matrices as well. But that's because N=0 makes it an empty matrix, which I'd say should only be represented by an empty list, i.e., M=0. And then O(MN) is correct. But if it's specified that the matrix can be represented so that N=0 and M>0, then O(MN) is wrong and I wouldn't say it. Just like here. The difference is that matrices are standard things whereas pretty much nothing is known about the OPs data. – Stefan Pochmann May 07 '17 at 23:54

Is it correct to say that this algorithm is O(n+m)?

3 Answers3