How do I convert an O(n^2) algorithm to O(n) in this particular example?

Question

I'm trying to solve this coding excercise I was given to practice my skills. It involves extracting some .JSON data for basketball players. My program has to find all the possible player pairs which heights when summed are equal to a given integer input.

Here's the code I devised it:

import json
import requests

def to_number(A):
    
    B = int(A)
    
    return B
        

def search(Number):
    response = requests.get("https://mach-eight.uc.r.appspot.com/")

    data = json.loads(response.text)
    PLAYERS = data["values"]

    answer_list = []

    for player1 in PLAYERS:
    
        raw_height = player1['h_in']
        height_1 = to_number(raw_height)
    
        PLAYERS.remove(player1)
        for player2 in PLAYERS:
        
            raw_height = player2['h_in']
            height_2 = to_number(raw_height)
        
            result = height_1 + height_2
        
            if result == Number:
                par = (player1['first_name'] + ' ' + player1['last_name'] + ' & ' + 
                       player2['first_name'] + ' ' + player2['last_name'])
            
                answer_list.append(par)
        
    return answer_list

def executer():
    Number = int(input("insert your integer: "))
    result = search(Number)
    
    return result
    
if __name__=="__main__":
    
    result = executer()
    stop_here = len(result)
    
    while stop_here == 0:
    
        print("No matches found, please try with a new number \n\n")
    
        result = executer()
        stop_here = len(result)
        
    print(result)

So far it does complete the objective of finding the pairs, but at the expense of a nested for loop and I need to come up with a way of decreasing the computation time, for example, as a O(n) algorithm.

What I've tried so far has been making the pairs without a loop, using the itertools package and a permutation function, but I quickly realized that just made it slower.

On the other hand, I thought about Subtracting each height of the players to the integer input, which would return the only possible heights that pair up with the initial one:

This approach would guide me directly to the only possible pairs, right? But I'm having trouble with what to do after this. I'm not sure how to pinpoint the players that correspond to the resulting height of the operation with only one loop.

If you could help me untangle this conundrum, I would be really thankful. Likewise, if you can think of another approach, I'm all ears and eyes.

You didn't specified what is your coding exercise. I recommend stating your question like: "I have the following coding exercise: ..." — azbarcea, Apr 08 '21 at 18:40
It looks like you need to enumerate all pairs of players whose height meets some criteria. Note that there are more optimal answers, on average, to the brute force method you're using, but in the worst case, you can't do better than O(n^2) since there are cases where you simply have to output all pairs of players in your use case. — wLui155, Apr 08 '21 at 19:13
The title says *"in this particular example"*. If that refers to the particular *input*, then time complexity considerations become irrelevant, because time complexities tell us something about what happens when the input grows to large sizes. — trincot, Apr 09 '21 at 05:46

score 2 · Answer 1 · answered Apr 08 '21 at 18:49

I'll ignore all of the superfluous code in your post; the issue is, given a list of integers, how to find pairs that add to a given target sum.

The straightforward way to do this is to sort the list, which is O(N log N) time. For illustration, let's consider the list s = [5, 6, 8, 13, 14, 15] and a target sum of 21. Let lo, hi = 0, len(s), pointers to the end elements.

Now we check the sum ...

total = s[lo] + s[hi]
if total == target:
    # print a found pair; move both pointers in one spot.
elif total < target:
    # sum is too small
    lo += 1
else:
    hi -= 1

Repeat this while lo < hi

There is another method that's even better: put the heights into a set, and then simply use in:

s_set = set(s)
for height in s_set:
    if (target - height) in s_set:
        # print a pair

Now this will find each pair twice; I'll leave the filtering to you. This also assumes that the heights are unique; if you need to identify all pairs of players, rather than merely heights, then you should use a dict keyed by height, with the value as a list of players -- but this is no longer O(N), unless you have a constant limit on the quantity of players with any given height.

I'm not a python guy. is `set` sorted when it's created? What's the search mechanism used by `in`? — 3Dave, Apr 08 '21 at 19:14
@3Dave, `set()` in Python is just a [hashset](https://stackoverflow.com/a/26724024/2172854), so no, the contents are not sorted. — abhishek_naik, Apr 08 '21 at 19:28

trincot · Answer 2 · 2021-04-09T05:42:26.383

2

First of all: because the desired output must contain all pairs that match the required sum of heights, the worst case complexity of any algorithm will be at least O(²). Take for example the case where all players have height 70, and the argument to the function is 140, then it is clear you must output all possible pairs. There are (-1)/2 of them, which is O(²). As the algorithm must produce that many pairs, it will have at least that many steps to execute, and so it is at least O(²). And I am ignoring here the number of characters in people's names. I will assume that it is a given that these names have at the most 100 characters, so that this will not influence the time complexity.

However, your algorithm is not optimal when looking at the average and best case time complexity, because then your algorithm is still O(²), while it can be done with a best case time complexity of O():

You can use a dictionary keyed by heights, and with as value the list of people (their full names) that have that same height.

Here is how your function could look:

def search(total):
    response = requests.get("https://mach-eight.uc.r.appspot.com/")

    data = json.loads(response.text)
    players = data["values"]

    d = defaultdict(list)
    for player in players:
        d[int(player['h_in'])].append(player['first_name'] + " " + player['last_name'])

    return [player + " & " + other
        for height in d
            if total - height in d
                for other in d[total - height]
                    for player in d[height]
                        if player < other
    ]

So, for instance, if the input has no players with the same height, then this algorithm will do the job in linear time.

edited Apr 09 '21 at 05:42

answered Apr 08 '21 at 18:59

trincot

317,000
35
244
286

Nice, you were faster. – ferdy Apr 08 '21 at 19:29
Thank you for your input. I've tried it and it certainly works pretty well. I would like to ask one additional question. What Big O regime is this? Would it be O(n) ? – MIGUEL ALFONSO FIGUEROA ANGULO Apr 08 '21 at 19:53
1

The average time complexity for adding items to -- and finding items in -- a dictionary is constant, and so this algorithm has indeed a time complexity of O(n). But the time complexity of dictionary operations can in theory degenerate to O(n). This is however very rare. See [this answer on that topic](https://stackoverflow.com/a/1963514/5459839) – trincot Apr 08 '21 at 19:58
Thank you so much for your detailed answer. – MIGUEL ALFONSO FIGUEROA ANGULO Apr 08 '21 at 19:59
I must come back to my previous comment. As the output can include O(n²) combinations, the complexity is actually O(n²) – trincot Apr 08 '21 at 20:00
Oh shoot, if it's O(n^2) I'll have to keep trying with some of the other answers. Thank you. – MIGUEL ALFONSO FIGUEROA ANGULO Apr 08 '21 at 20:10
It doesn't matter. Since you want the output to be O(n²), there is no way to do it faster :)). Think of it. If you have n persons, and they all have the same height, and you want the pairs that make up the double of that height, you need to return every possible pair, which is O(n²) output, and thus O(n²) work. So you just must make sure that an algorithm does not need *more* time complexity than that. – trincot Apr 08 '21 at 20:17
I'm sorry trincot. I forgot to mention it, but one of the requirements of the excercise is that the complexity be more efficient than O(n^2). Does that mean it's impossible to do so? – MIGUEL ALFONSO FIGUEROA ANGULO Apr 08 '21 at 20:27
Well, did you understand what I wrote above? If n is the number of persons, then the output size can be O(n²) in the worst case (that I described). Output must be *produced*, and so that takes O(n²) time. – trincot Apr 08 '21 at 20:29
It would be a different story if you were asked to only output the *count* of pairs that meet the condition. Or, only *one* instance of such a pair. This is why it is important to *quote* the assignment in your question, so we can assess ourselves whether there is not a wrong interpretation of the assignment at play. (See first comment made to your question). – trincot Apr 08 '21 at 20:31
@MIGUELALFONSOFIGUEROAANGULO The raw hierarchy is something like this: `O(c) < O(log n) < O(n) < O(n log n) < O(n^k) << O(c^n) ... for c, k being constant.` This is not an exhaustive hierarchy and shall only show you the basics. You can substitute different values for n, and then you see how big log n, n or c^n can grow for very big n. That means, an algorithm that ensures that you're at least solving a problem in linear time complexity where another algorithm takes polynomial or even exponential time can make a very, very big difference when having a lot of input data. HTH – ferdy Apr 08 '21 at 22:31
1

@MIGUEL, any insight into the actual challenge you are looking at? It would be best if you would quote it. I have extended my answer with the reasoning for the time complexity and why no algorithm can do better than O(²). However, there still is a difference in *average* and *best case* time complexity. Maybe the challenge mentioned something about that? Without seeing the actual challenge we cannot do much more for you. – trincot Apr 09 '21 at 05:44

score 0 · Answer 3 · answered Apr 08 '21 at 18:48

Why don't you create a set of "complementary heights", e.g. new Set(player1.map(height => target - height)).

Once you have that set you can check player2 heights, if you found an entry it means you just found a pair that sum up to the target.

You may check out this problem https://leetcode.com/problems/two-sum/

3Dave · Answer 4 · 2021-04-08T19:11:15.887

What I'd do:

Sort the input list by height

for each player, 
   searchValue = player.height - requestedTotalHeight
   Binary search: find all matches in the sorted list with height searchValue
   add match to the output

This should give you an N log N runtime, not counting the sort. But, due to the way CPUs work this century (branch prediction), sorting the list first would give you a faster response even with your current N^2 method.

Alternatively, you could skip the sort and build a search tree while importing the data. If the data is sufficiently random, the tree shouldn't be too terribly unbalanced.

Thank you for pointing me in the right direction. – MIGUEL ALFONSO FIGUEROA ANGULO Apr 08 '21 at 20:00 — MIGUEL ALFONSO FIGUEROA ANGULO, Apr 08 '21 at 20:00

ferdy · Answer 5 · 2021-04-09T07:48:49.103

Some hints:

First, you must get rid of your nested loop. This is inherently O(n²). Yes, you can use clever O(nlogn) search/sort algorithms as described to improve running time.

A solution to achieve O(n) would rather be, to use a datastructure that supports you. A hashmap can help you to be accessed in O(1). Fill it up using O(n). In python a hashmap is a dictionary. Use it:

my_list = [ 2, 3, 5, 2, 3, 6]
my_target = 5

print(f"Find pairs in {my_list} that sum up to {my_target}")

d = {}  # use a dictionary

result = []

for i in range(len(my_list)):

    if my_target - my_list[i] in d.keys():
        result.append((my_list[d[my_target - my_list[i]]], my_list[i]))
    d[my_list[i]] = i

print(f"results: {result}")

Study it, understand it, adapt it to your problem.

Update

I took me the time for fun and profit and wrote the solution in O(n) for you. It's based upon the algorithm shown above. It leverages the fact of hashmap access in O(1).

import json
import requests


def search(target_sum):
    response = requests.get("https://mach-eight.uc.r.appspot.com/")
    data = json.loads(response.text)
    players = data["values"]

    idx_map = {}  # will hold the height as key and it's index in the player list as value
    answer_list = []

    for idx, p in enumerate(players):
        h_in = int(p['h_in'])
        if target_sum - h_in in idx_map.keys():  # the complementary value is in the idx_map!
            p1 = players[idx_map[target_sum - h_in]]  # the according player is resolved
            p2 = players[idx]
            answer_list.append((p1, p2))  # save 'em as a fine tuple

        idx_map[h_in] = idx  # save the idx of the player at the

    return answer_list


def get_person_string(player):
    return f"{player['first_name']} {player['first_name']} ({player['h_in']})"


def main():
    while 1:
        try:
            target_sum = int(input("Insert target sum: "))
            result = search(target_sum)
            for pair in result:
                print(f"Player {get_person_string(pair[0])} and Player {get_person_string(pair[1])} "
                      f"together are {target_sum} inches tall.")
        except ValueError:
            break


if __name__ == "__main__":
    main()

Output:

Insert target sum: 90
Insert target sum: 130
Insert target sum: 140
Player Speedy Speedy (71) and Player Nate Nate (69) together are 140 inches tall.
Player Brevin Brevin (70) and Player Mike Mike (70) together are 140 inches tall.
Insert target sum: 180
Insert target sum: 176
Player Roy Roy (86) and Player Yao Yao (90) together are 176 inches tall.
Insert target sum: 175
Player Robert Robert (85) and Player Yao Yao (90) together are 175 inches tall.
Insert target sum: 174
Player Jason Jason (84) and Player Yao Yao (90) together are 174 inches tall.
Player Yao Yao (90) and Player Yi Yi (84) together are 174 inches tall.
Insert target sum: x

** Update **

The algorithm here does not work for all cases. Edge case: If every player has the same height and we're looking for height*2, then we would need the cartesian product of the set of players, say P.

So P² = P x P = { (a, a') | a, a' in P }.

My algorithm only gets consecutive player elements in that edge case. There might be a way to improve that by using a dict of indexes as value to the index map, but the same problem persists then: In the worst case (and that whats asymptotic analysis is mainly about) you will need to traverse through all elements for all elements, leading to O(n²) worst case time complexity. Thanks for trincot to leading me to this.

Thank you for pointing me in the right direction. – MIGUEL ALFONSO FIGUEROA ANGULO Apr 08 '21 at 20:00 — MIGUEL ALFONSO FIGUEROA ANGULO, Apr 08 '21 at 20:00

score 0 · Answer 6 · answered Apr 08 '21 at 22:17

First, you have a bug in your code. You have a for loop over PLAYERS and you are modifying the list you are looping over when you use the .remove() method. That's going to lead to unexpected results (basically, it's going to skip some entries in your original list when it shouldn't). More details in this StackOverflow post.

Now on to your question. As others have mentioned, a dictionary with height values as the keys is a good way to go. However, I'm fairly certain the time complexity is still O(n^2). Like @trincot said, it's impossible to improve upon that if you need to output the list of matching pairs. That boils down to taking the Cartesian product of two lists (that's the itertools.product call in the code below), which has complexity O(n^2). See this post for further explanation.

The only way to beat O(n^2) is if you are trying to return a different value. For example, if you just need calculate how many pairs sum to N rather than returning the list of pairs, that algorithm could run faster than O(n^2).

Anyway, here's an implementation of dictionary approach.

import json
import requests
import itertools


def get_player_data() -> list:
    response = requests.get("https://mach-eight.uc.r.appspot.com/")
    return json.loads(response.text)["values"]


def build_height_index(player_data: list) -> dict:
    height_index = {}
    for player in player_data:
        name = f"{player['first_name']} {player['last_name']}"
        h = int(player["h_in"])
        height_index.setdefault(h, []).append(name)
    return height_index


def search(player_data: list, n: int) -> list:
    height_index = build_height_index(player_data)
    pairs = []
    for h in height_index:
        # avoid double counting pairs
        if h > n / 2:
            continue

        player_list = height_index.get(h, [])
        if h == n / 2:
            # the players in this list are pairs with each other
            new_pairs = [*itertools.combinations(player_list, r=2)]

        else:
            # find the list of players with the matching height
            matching_height = n - h
            matching_list = height_index.get(matching_height, [])
            new_pairs = [*itertools.product(player_list, matching_list)]

        pairs += new_pairs

    return pairs


def format_output_pairs(pairs: list) -> list:
    """
    This will format the output in a string like you had it, but
    otherwise it's unnecessary.
    """
    return [" & ".join(sorted(x)) for x in pairs]


def find_pairs(n: int) -> list:
    player_data = get_player_data()
    pairs = search(player_data, n)
    return format_output_pairs(pairs)


### Example output:
# > find_pairs(140)
# ['Brevin Knight & Mike Wilks',
#  'Chucky Atkins & Nate Robinson',
#  'Nate Robinson & Speedy Claxton']

How do I convert an O(n^2) algorithm to O(n) in this particular example?

6 Answers6