How to calculate the time complexity of this odd method

Question

The method is:

List<Book> books = new List<Book>();

public List<Book> Shoot()
{
   foreach(var b in books)
   {
       bool find = true;
       foreach(var otherb in books)
       {
           if(otherb != b && otherb.Author == b.Author)
           {
               find = false;
           }
       }

       if(find)
       {
           yield return b;
       } 
   }
}

Normally, the time complexity will be O(books.Count^2), but there is a if(find) statement in the outer loop and it may change the loop times.

So my questions are:

What is the time complexity of this method?
How did you calculate it?

I'm waiting online for your answer.

Thank you in advance.

The `find` variable alters the running time by at most O(books.Count), which is less than O(books.Count^2). Therefore the total complexity is still O(books.Count^2). — Raymond Chen, Jun 29 '12 at 00:28
Two nested loops, big oh n x n. Sort by author if you want to make it happier. — Hans Passant, Jun 29 '12 at 00:35
I'm not sure if this is of relevance, but generally speaking nested loops you find when performing a dependency sort but if found outside that are suspect. I'm just not sure that the question is "what's the time complexity" as much as it is "what are we really trying to do here and maybe we can do it a lot better". If you want, let me know what the business is asking from you and maybe we can find a more simplistic way. — Mike Perrenoud, Jun 29 '12 at 00:45
You seem to be asking a lot of questions about yield. Perhaps you should read up on it first then return with more specific questions. For example, your question 4 is answerable only by you: You're the one who wrote the function. You're the only one who knows what it's supposed to do. (You never told us what it's supposed to do so how do we know why it was done in any particular way?) — Raymond Chen, Jun 29 '12 at 03:03

score 3 · Accepted Answer · edited May 23 '17 at 12:04

3

You would go through each book in the outer loop (n) and for each outer book you would go through each otherb in the inner loop (n times) so the the time complexity would be O(n^2).

The yield return would not change the complexity of the algorithm, it creates an iterator pattern but if you traverse the whole list from the calling function, you go through all the iterations in your algo.

What is the yield keyword used for in C#?

To optimize the algorithm, as btilly mention, you could do two passes over the collection, on the first pass you store the number of books per author in a hash table and on the second pass you check if the author has more than one book using the hash table (assuming constant time for the lookup) and yield the book if it does:

public List<Book> Shoot()
{
    var authors = new Dictionary<string, int>();
    foreach(var b in books) 
    {
        if(authors.ContainsKey(b.Author))
            authors[b.Author] ++;
        else
            authors.Add(b.Author, 1);
    }

    foreach(var b in books) 
    {
        if(authors[b.Author] == 1)
            yield return b;
    }
}

This way you have a linear time complexity of O(n), note that you would need O(n) extra space in this case.

edited May 23 '17 at 12:04

Community

1
1

answered Jun 29 '12 at 00:26

Jaime

6,736
1
26
42

The best case isn't much better because the inner loop does not break. it always runs the full n iterations. – Raymond Chen Jun 29 '12 at 00:29
Usually we're interested in the average case, not the worst case. Don't believe me? Do you think of a hash lookup as `O(n)` or `O(1)`? Is quicksort `O(n*n)` or `O(n log(n))` in your world? (Yes, I know you can make the worst case better than those. But real world implementations typically don't.) – btilly Jun 29 '12 at 00:31
@Raymond correct, the best case would be 1 loop in the outer one + n times in the inner loop – Jaime Jun 29 '12 at 00:31
@Jaime The outer loop never breaks either. It also runs for the full n iterations. – Raymond Chen Jun 29 '12 at 00:33
@RaymondChen, you're right, I was blinded and seeing return instead of yield :) – Jaime Jun 29 '12 at 00:35
hi guys, I am worried about the use of yield. Does it affect the time complexity? – Franva Jun 29 '12 at 01:07
brilliant! I love this idea :) – Franva Jun 29 '12 at 03:33

score 1 · Answer 2 · answered Jun 29 '12 at 00:38

Your worst case performance per yield is O(n * n). Your best case is O(n). If you assume that authors are randomly sorted, and a fixed portion only write one book, then the average case between yields is O(n) because the probability of getting to m iterations of the outer loop decreases exponentially as m increases. (Insert standard geometric series argument here.)

Generally (but not always!) people are most interested in the average case.

Incidentally the standard way of handling this problem would be to create a dictionary up front with all of the authors, and the count of how many books they wrote. That takes time O(n). And then your yields after that would just search through the keys of that dictionary looking for the next one with only 1 entry. The average time of subsequent yields would be O(1), the worst case O(n), and the amortized average time across all yields (assuming that a fixed proportion only wrote one book) will be O(1) per yield. As opposed to the current implementation which is O(n) per yield.

How to calculate the time complexity of this odd method

2 Answers2