1

I wrote two solutions in python.. It is supposed to take a list of numbers and sort the ones that add up to a sum, both these return the same pairs, but which one is more efficient? I'm not sure if using python's count method does more work behind the scene making the second one longer

numbers =  [1, 2, 4, 4, 4, 4, 5, 7, 7, 8, 8, 8, 9]

match = []
for i in range(len(numbers)):
    for j in range(len(numbers)):
        if (i!=j):
            if(numbers[i] + numbers[j] == sum):
                match.append([numbers[i], numbers[j]])


match2 = []

for i in range(len(numbers)):
    counterPart = abs(numbers[i] - sum)

    numberOfCounterParts = numbers.count(counterPart)

    if(numberOfCounterParts >= 1):
        if(counterPart == numbers[i]):
            for j in range(numbers.count(counterPart)-1):
                match2.append([numbers[i], counterPart])
        else:
            for j in range(numbers.count(counterPart)):
                match2.append([numbers[i], counterPart])

Is there an even better solution that I'm missing?

gsamaras
  • 71,951
  • 46
  • 188
  • 305
joe
  • 1,563
  • 4
  • 16
  • 35
  • 1
    check it yourself, https://stackoverflow.com/a/7370824/3462319 – depperm Jun 16 '17 at 15:21
  • 1
    Honestly, both of them are not efficient solutions, use dictionary is the way to go. Your algorithm is O(n^2). – Bubble Bubble Bubble Gut Jun 16 '17 at 15:25
  • @Ding Would you suggest converting the list into a dict programmatically? Making the key the index? – joe Jun 16 '17 at 15:30
  • Loop through the list and for each number `x` use `sum-x` as the key and `x` as value. This way the you only need to go through the list once. – Bubble Bubble Bubble Gut Jun 16 '17 at 15:33
  • @Ding are you sure this is the method I should take? Because I did some searching and found this https://stackoverflow.com/questions/3420937/algorithm-to-find-which-number-in-a-list-sum-up-to-a-certain-number which top answer uses a recursive function not using dicts – joe Jun 16 '17 at 15:35
  • This problem is going to be O(N^2). In your first solution, you can set up some flags to break the second loop after successes are exhausted. In other words, if your numbers are sorted and you are looking for sums of 4, there's no use in continuing the list if the sums are greater than your target. So add an "if(numbers[i] + numbers[j] > sum): break" line before the if (i != j). That should cut some runtime in any case – Alan Leuthard Jun 16 '17 at 16:38
  • Well I don't think you should assume the numbers are sorted. – joe Jun 16 '17 at 16:42
  • So testing against "numbers = random.choices(range(0, 101), k=1000)" and "target = random.choice(range(0, 21))" would be valid? – Alan Leuthard Jun 16 '17 at 16:59
  • Against those random sets, your second method doesn't work. The first method and the method suggested by Delirious Lettuce in an answer have the same output with the latter being much faster. Extending the list to 10000 items, method 1 takes 12-16 seconds, the Delirious method takes 0.006 seconds almost every time. – Alan Leuthard Jun 16 '17 at 17:21

4 Answers4

0

you can run test yourself using the timeit module:

t1 = timeit(setup='from __main__ import sort1, numbers',
            stmt='sort1(numbers)',
            number=1)
t2 = timeit(setup='from __main__ import sort2, numbers',
           stmt='sort2(numbers)',
           number=1)

print(t1)
print(t2)

also note that sum is a built-in and therefore not a good name for a variable...

there are way better algorithms for that! especially considering you have duplicates in your list.


here is a faster version that will only give you the matches but not the multiplicity of the matches:

def sum_target(lst, target):
    # make list unique
    unique_set = set(lst)
    unique_list = list(unique_set)

    remainders = {number: target-number for number in unique_list}
    print(remainders)

    match = set()
    for a, b in remainders.items():
        if a == b and lst.count(a) >= 2:
            match.add((a, b))
        else:
            if b in remainders:
                match.add(frozenset((a, b)))

    return match
hiro protagonist
  • 44,693
  • 14
  • 86
  • 111
  • But I need duplicate values as well.. so making it unique isn't what I want.. some people are suggesting to use a dictonary – joe Jun 16 '17 at 15:42
  • getting the multiplicity back after you have the pairs that sum to your target value is fast & easy. – hiro protagonist Jun 16 '17 at 15:43
  • Wouldn't you then have to do another search to see how many times it appears though? – joe Jun 16 '17 at 15:44
  • that approach does not work! sorry. if the target is 12 and there are two 6es in the list the algorithm will not find it... i'll have another look at it. – hiro protagonist Jun 16 '17 at 15:48
0

It's almost always useful to measure complexity of your algorithms.

Both of your algorithms has O(N^2) complexity, so there are almost interchangeable in terms of performance.

You may improve your algorithm by keeping a mapping of value-index pairs. It will reduce complexity to O(N), basically you'll have one loop.

Egor Biriukov
  • 649
  • 1
  • 7
  • 16
0

When comparing algorithms, you should compare their time complexities. Measuring the time is also a good idea, but heavily dependent on the input, which now is tiny.


The first algorithm takes:

O(N2)

because of the double for loop.

For the second algorithm, you should take into account that count() has a time complexity of O(N). You have on for loop, and in its body count() will be called twice, once after abs() and once in whichever body of the if-else statement you go into. As a result the time complexity is O(N) * 2 * O(N) = 2*O(N<sup>2</sup>), which yields:

O(N2)

That means that both algorithm have the same time complexity. As a result it now has meaning to measure the performance, by running many experiments and taking the average of the time measurements, with big enough input to reflect performance.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • Alright I figured that.. So how could I write a faster algorithm? Dict won't work because I need to check for duplicate values – joe Jun 16 '17 at 15:52
  • @joe that's another question. Accept an answer of the ones already posted, and if needed post a new question, or even better post in Code Review. ;) – gsamaras Jun 16 '17 at 15:56
-1

Yes there is better algorithm which can be used if you know lower_bound and upper_bound of the data. Counting Sort which takes O(N) time and space is not constant (which depends on the range of upper bound and lower bound).

Refer Counting Sort

PS: Counting Sort is not a comparison based sort algorithm.

Refer below sample code:

def counting_sort(numbers, k):
    counter = [0] * (k + 1)
    for i in numbers:
        counter[i] += 1

    ndx = 0
    for i in range(len(counter)):
        while 0 < counter[i]:
            numbers[ndx] = i
            ndx += 1
            counter[i] -= 1
Yaman Jain
  • 1,254
  • 11
  • 16