1

I have a large list with numbers n1 in listn1. I want to add all multiples of each number, if the multiplicand is not in another list listn2 (prime numbers with specific characteristics) and the product is below a maximum max_n. The multiplicand is in most cases below 10, but can go up to 100,000. My code so far looks like:

s = 0
for n1 in listn1:
    s += sum(n1 * i for i in range(1, 1 + max_n // n1) if i not in listn2)

The problem: this approach is sloooow. It takes seconds to calculate listn1 and listn2, so I know that there are more than a million numbers to add. But I started the summation yesterday evening and it is still running this morning.

Is there a Pythonic way to speed up this summation?

Mr. T
  • 11,960
  • 10
  • 32
  • 54
  • 3
    For starters, make `listn2` a `set`. `if x in y` is an `O(1)` operation on avg if `y` is a set and `O(len(y))` on avg if `y` is a list. Also, try to work with `numpy` so you can vectorize some operations. – DeepSpace Nov 07 '17 at 09:48
  • 1
    You can use the sum of `arithmetic and geometric sequences` for get sum of n1 * i for i in range(1, 1 + max_n // n1), than get other sum when i in listn2, all this using matlap feature – khelili miliana Nov 07 '17 at 10:05
  • @Deepspace: Didn't know that difference between set and list. Will have to keep `setn2` and `listn2`, because `setn2` will lose the order of `listn2`. `numpy` resisted an installation on my computer, but it seems I have to give it another try. – Mr. T Nov 07 '17 at 10:53
  • @Anonymousmiliana True, `listn2` has less elements, so will need less operations. Maybe I should add the elements in `n2` in advance and then just look up the result, so it doesn't have to do this summation every time from scratch. Thanks, much appreciated. – Mr. T Nov 07 '17 at 10:53
  • Thanks to the input I just noticed that `n1 * sum(i)` requires way less operations than `sum(n1 * i)`. Way to go. – Mr. T Nov 07 '17 at 11:17
  • A set will only contain unique values of the list. This will help speed up, if listn2 has repetition of any value. I am not sure, if apart from this, there is any other reason as well, because of which set would be faster. – Sachit Nagpal Nov 07 '17 at 12:47
  • 1
    @SachitNagpal I tried both, `setn2` and `list(setn2)`. Though both contain the same number of elements, the computation time with `setn2` is massively reduced. Not sure, why, but @Deepspace was right. – Mr. T Nov 07 '17 at 23:56

2 Answers2

1

I have 2 suggestions for you.

First of all, you don't have multiply i with n1 at each iteration. You can replace

s += sum(n1 * i for i in range(1, 1 + max_n // n1) if i not in listn2)

with

s += n1 * sum(i for i in range(1, 1 + max_n // n1) if i not in listn2)

They are totally same.

Secondly, without if i not in listn2 condition, you have a simple summation:

sum(i for i in range(1, 1 + max_n // n1)

This is same with sum([1, 2, 3, 4, 5, 6, 7, 8, ..., (max_n // n1)]), and equals (max_n // n1) * (1 + max_n // n1) / 2. For a simply example, take a look at this.

To handle if i not in listn2 condition, if your listn2 is smaller, you can sum listn2 instead of listn1.

So, find the sum of listn1 and subtract the items in listn2:

def sum_until(l, max):
    return sum([x for x in l if x < max])

listn2 = list(set(listn2))

for n1 in listn1:
    finish = max_n // n1
    s += n1 * (finish * (finish + 1) / 2 - sum_until(listn2, finish)) 

EDIT:

I guess NumPy would be faster for sum. Make listn2 a numpy array:

import numpy as np

listn2 = np.array(list(set(listn2))) 

And use this sum_until function:

def sum_until(listn2, max):
    l = listn2[np.where(listn2 <= max)]
    return int(np.sum(l))
Alperen
  • 3,772
  • 3
  • 27
  • 49
  • Are we telepathically connected? I wrote your first suggestion in the other comment section, while you were typing it. Thanks, will implement all suggestions. – Mr. T Nov 07 '17 at 11:29
  • :):) You're welcome. Second one is more effective for speeding up if `listn2` is small. If it is not small enough, you can define it as numpy array. Numpy sum is faster. – Alperen Nov 07 '17 at 11:34
  • I thought now about summing up `listn2` elements in advance in `sumlistn2`, then look up index in `listn2` and retrieve the sum from `sumlistn2`. `setn2` would be faster, I learned now, but it loses the order, I need. Numpy resisted installation, I have to look into this. – Mr. T Nov 07 '17 at 11:46
  • I agree about telepathic connection :):) I was looking for a better way of `listn2` sum. I've used `sorted()` and `np.cumsum()`. I am still trying, there is a mistake in somewhere, and I can't find it. – Alperen Nov 07 '17 at 12:06
  • Thanks, but please don't put any more effort into it. I rather need algorithmic suggestions than the actual code. Not a programmer here, so program structure is my problem. – Mr. T Nov 07 '17 at 12:08
  • OK, I put everything I have to my answer, and made a correction in the code. I hope, it helps. – Alperen Nov 07 '17 at 12:28
0

Taking the suggestions on board, I re-wrote the code to

setn2 = set(listn2)
s = 0
for n1 in listn1:
    s += n1 * (max_n * (max_n + 1) // 2 - sum(i for i in range(1, 1 + max_n // n1) if i in setn2))

Instead of hours, the summation now takes seconds.

Several hours later

Coming across this old question of mine, it turned out that retrospectively, I did the right thing. The idea mentioned here in this thread to use numpy.sum() instead was well intended but wrong, as shown here. If you work in numpy, fine, but if you have a Python list, use comprehensions.

Mr. T
  • 11,960
  • 10
  • 32
  • 54