Why is the complexity of BFS O(V+E) instead of O(E)?

Question

This is a generic BFS implementation:

For a connected graph with V nodes and E total number of edges, we know that every edge will be considered twice in the inner loop. So if the total number of iterations in the inner loop of BFS is going to be 2 * number of edges E, isn't the runtime going to be O(E) instead?

Does this answer your question? [Why is the complexity of BFS O(V+E) instead of O(V\*E)?](https://stackoverflow.com/questions/18604803/why-is-the-complexity-of-bfs-ove-instead-of-ove) — frozen, Aug 09 '20 at 01:28
also https://cs.stackexchange.com/questions/33664/why-is-the-complexity-of-the-bfs-ov-e — frozen, Aug 09 '20 at 01:29
@frozen I've seen the 1st link, but not the 2nd. the 2nd one is interesting, do you know what the answerer means when he says "The overhead for initialization is ()" — csguy, Aug 09 '20 at 01:31
Please post the code [as text, not an image](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question). Thanks. — ggorlen, Aug 09 '20 at 01:55
@ggorlen I mean sure, but it's literally B F S nothing crazy — csguy, Aug 09 '20 at 02:11
Did you read the link above? It's a waste of space (data limits on phones) and harms accessibility and searchability. The hosted image URL can go down over time. Some people can't load it due to proxies. It's hard to read. Anyway, I'm just regurgitating the linked post.... — ggorlen, Aug 09 '20 at 02:31
For a connected graph, O(V+E) = O(E). O(V+E) is written for unconnected graphs. Most of your answers are just wrong. You've accepted the wrong answer, and comments link it to a question that yours duplicates, *also* with the wrong answer accepted. sigh. — Matt Timmermans, Aug 09 '20 at 04:17
@MattTimmermans i'm just trying to find an answer, ive unaccepted for now. would love to hear more of your explanation — csguy, Aug 09 '20 at 04:38
The O(V) is the cost of initialization for the visited array, as @CortAmmon indicates. For unconnected graphs it can be > O(E), because you need to initialize cells for vertices that you *can't* find by traversing edges. — Matt Timmermans, Aug 09 '20 at 04:41

Cort Ammon · Accepted Answer · 2020-08-09T01:58:28.673

This is a case where one needs to look a little deeper at the implementation. In particular, how do I determine if a node is visited or not?

The traditional algorithm does this by coloring the vertices. All vertices are colored white at first, and they get colored black as they are visited. Thus visitation can be determined simply by looking at the color of the vertex. If you use this approach, then you have to do O(V) worth of initialization work setting the color of each vertex to white at the start.

You could manage your colors differently. You could maintain a data structure containing all visited nodes. If you did this, you could avoid the O(V) initialization cost. However, you will pay that cost elsewhere in the data structure. For example, if you stored them all in a balanced tree, each if w is not visited now costs O(log V).

This obviously gives you a choice. You can have O(V+E) using the traditional coloring approach, or you can have O(E log V) by storing this information in your own data structure.

You specify a connected graph in your problem. In this case, O(V+E) == O(E) because the number of vertices can never be more than E+1. However, the time complexity of BFS is typically given with respect to an arbitrary graph, which can include a very sparse graph.

If a graph is sufficiently sparse (say, a million vertices and five edges), the cost of initialization may be great enough that you want to switch to a O(E ln V) algorithm. However, these are pretty rare in a practical setting. In a practical setting, the speed of the traditional approach (giving each vertex a color) is just so blinding fast compared to the more fancy data structures that you choose this traditional coloring scheme for everything except the most extraordinarily sparse graphs.

If you maintained a dedicated color property on your vertices with an invariant rule that all nodes are black between algotihm invocations, you could drop the cost to O(E) by doing each BFS twice. On your first pass, you could set them all to white, and then do a second pass to turn them all black. If you had a very sparse graph, this could be more efficient.

We typically don't talk to the exotic situations like I describe in the last paragraph because, by the time you are responsible for implementing algorithms like this over a sparse graph, you are typically focusing on other metrics, and the fine nuances of the big-oh notation become less important. — Cort Ammon, Aug 09 '20 at 01:59
This answer is correct. It should be noted that there are data structures you can use for the visited set that that allow O(E) time for BFS even for unconnected graphs, but they aren't usually used. — Matt Timmermans, Aug 09 '20 at 04:36
@MattTimmermans and CortAmmon, so from what I understand, the bfs algo of adding visited nodes onto a hash set each iteration (similar to the pseudocode i posted) is actually not O(V+E) but O(E log V)? — csguy, Aug 09 '20 at 04:51
@csguy If you use a hash set, it's O(E) *expected* time, because a hash set takes O(1) expected time per operation. The actual *worst case* time for hash set operations depends on the implementation details. — Matt Timmermans, Aug 09 '20 at 04:54

score 2 · Answer 2 · answered Aug 09 '20 at 02:41

2

Well, let's break it up into easy pieces...

You've kept a visited array, and by looking it up, you decide whether to push a node into the queue or not. Once visited, you don't push it again. So, how many nodes get pushed into the queue: (of course) V nodes. And it's complexity is O(V).
Now, each time, you take out a node from queue and visit all of its neighboring nodes. Now, following this way, for all of V nodes, how many node you'll come across. Well, it's the number of edges if the graph is unidirectional, or, 2 * number of edges if the graph is bidirectional. So, the complexity would be O(E) for unidirectional and O(2 * E) for bidirectional.

So, the ultimate(i.e. total) complexity would be O(V + E) or O(V + 2 * E) or generally, we may say O(v + E).

answered Aug 09 '20 at 02:41

reyad

1,392
2
7
12

we go through each of the neighbors and if it isn't visited, we can push so it seems like step 1 is double counting what you already accounted for in step 2 – csguy Aug 09 '20 at 02:47
That's why, I've seperated them and described in two steps...many of us misunderstand the two different steps...cause step-2 is finding neighbors which is by edge...but step-1 i.e. pushing them(if not visited) into queue also takes O(1) operation, and how much you push into queue: V nodes... – reyad Aug 09 '20 at 02:55
slightly confused because im saying that your "step 1" is actually just what happens in "step 2" when a node is not previously visited (see the above pseudocode for reference) – csguy Aug 09 '20 at 03:04
It would have been better, if I could show the whole visualization of a bfs algo, if you take my advice, can u pls do the visualization in pencil and paper? It would be clear...Oh and I see you already accepted an answer...I guess its clear to you now... – reyad Aug 09 '20 at 03:07
1

@csguy, If you're thinking, pushing into queue(step-1) is part of step-2, then, think about `taking the node out of queue`, well, that takes O(1) per dequeue operation...I hope, that'll help you to undertsand that, its different from step-2(visiting neighbor operation). In simple, we go to each of V nodes first; then for each node we visit the neighbours...(and total neighbor is E), so O(V + E)... – reyad Aug 09 '20 at 03:11
yeah i can see that. appreciate the help – csguy Aug 09 '20 at 03:51
"So how many nodes get pushed into the queue?" Only the ones you find via an edge, I think. – Matt Timmermans Aug 09 '20 at 04:30
@MattTimmermans, Of course you're right...but here, I meant it by total...as there are total V nodes, also, notice I talked about visited array...and also notice, `Once visited, you don't push it again.` part...Sorry, if its not clear to you... – reyad Aug 09 '20 at 04:41

another_CS_guy · Answer 3 · 2020-08-09T02:52:10.493

0

Because there might be graph having edges less than number of vertices. Consider this graph:

1 ---- 2
|
|
3 ---- 4

There are 4 vertices but only 3 edges, and in BFS you have to traverse each and every vertex. Thatswhy time complexity is O(V+E) as it considers both V as well as E.

edited Aug 09 '20 at 02:52

answered Aug 09 '20 at 02:48

another_CS_guy

682
6
7

E is total number of edges so there are going to be 2 * 3 edges stored in an adjacency list, but this is a great point actually – csguy Aug 09 '20 at 02:52
Ok so in this case if it is directed graph there will be 3 edges, and even if there is 1 edge between any 2 vertices in undirected graph , it would still be considered a graph. But in that cases as well, you will need to traverse whole n vertices. – another_CS_guy Aug 09 '20 at 02:54
anyone looking for extra info : https://stackoverflow.com/questions/26750303/why-is-time-complexity-for-bfs-dfs-not-simply-oe-instead-of-oev – csguy Aug 09 '20 at 02:55
2

This example is still O(E). There can only ever be 1 more vertex than edge. as E goes to infinity, V/E approaches 1 – Cort Ammon Aug 09 '20 at 03:44
If your BFS finds vertexes that aren't connected by an edge to the start, then you're doing it wrong. – Matt Timmermans Aug 09 '20 at 04:12
@CortAmmon and Matt, thanks for pointing out the mistake. I got confused with Breadth First Search and Breadth First Traversal. Are my points valid for Breadth First Traversal? Would appreciate your feedback on this. – another_CS_guy Aug 09 '20 at 07:17

Why is the complexity of BFS O(V+E) instead of O(E)?

3 Answers3

Linked