Iterating over a large list

Question

I have a list in the range [465868129, 988379794] both inclusive. When I use the following code I get a Memory Error. What can I do?

r = [465868129, 988379794]
list = [x for x in xrange(r[0], r[1]+1)]

What you are going to do is important because answers like _use a lazy iterable_ seem like the right way to go, but we can't be sure. — kojiro, Jan 04 '15 at 03:31
@thefourtheye I'm going to count the number of square integers between that range. — ChrisD, Jan 04 '15 at 03:31
You really want to avoid creating that huge `list = [x for x in xrange(r[0], r[1]+1)]`. It will have 500million entries. — smci, Jan 04 '15 at 03:32
@ChrisD right, so don't create a huge list. Just use the xrange, which is lazy. — kojiro, Jan 04 '15 at 03:32
@smci - I don't understand... you can only take the length of an iterable by iterating. Some iterables like `itertools.cycle` are infinite and only Chuck Norris can take the size of them. — tdelaney, Jan 04 '15 at 03:41
@tdelaney if you _know_ it's infinite, you don't need to take the size. :P — kojiro, Jan 04 '15 at 03:44

score 4 · Accepted Answer · answered Jan 04 '15 at 03:35

You could iterate over the xrange directly instead of creating a list.

for x in xrange(r[0], r[1] + 1):
    ...

But iterating over such a large range is a very, very slow way to find squares. The fact that you run out of memory should alert you that a different approach is needed.

A much better way would be to take the square roots of each end point and then iterate between the square roots. Each integer between the square roots, when squared, would give you one of the numbers you're searching for.

In fact, if you're clever enough, you can generate all the squares with a single list comprehension and avoid an explicit for loop entirely.

List comprehension? Technically, OP said he wanted to _count_ the squares, so you don't even need a list comprehension. Just some arithmetic. But I suspect this is a HW question, and perhaps there is yet some arbitrary constraint against _not_ using lists. :/ — kojiro, Jan 04 '15 at 03:39

score 1 · Answer 2 · edited May 23 '17 at 11:57

Unless you have a very good reason to store the list items in a list, iterate over the generator instead, that way Python won't need to allocate a lot of memory (causing your Memory Error) to create that list:

init, end = (465868129, 988379794)
items = xrange(init, end + 1)

for item in items:
    #Do something with item

To count squares on an arbitrary range consider the following formula:

import math

number_of_squares = int(math.sqrt(end) - math.sqrt(init)) + 
                    op(is_perfect_square(init), is_perfect_square(end))

The is_perfect_square(n) is another problem on its own, so check this post if interested.

The op is used to adjust the number of squares when the init and end of the intervals init (or/and/neither) end are perfect squares. So we need a function with the following characteristics:

Both numbers are perfect squares: Eg: 25,64 => 8 - 5 = 3 (and there are 4 squares on that range). (it should sum 1 more)
End is a perfect square: Eg: 26,64 => 8 - 5 = 3 (There are 3 squares on that range). (it is correct => it should sum 0)
Init is a perfect square: Eg: 25,65 => 8 - 5 = 3 (There are 4 squares on that range). (it should sum 1 more)
None of the numbers are primes: Eg: 26, 65 => 8 - 5 = 3 (There are 3 squares on that range) (it is correct => it should sum 0)

So we need an operator with the following characteristics, based on the past examples:

1 op 1 = 1 (Both numbers are perfect squares)
0 op 1 = 0 (End is a perfect square)
1 op 0 = 1 (Init is a perfect square)
0 op 0 = 0 (None of the numbers are perfect squares)

Note that the max function almost fulfils our needs, but it fails on the second case max(0,1) = 1 and it should be 0.

So, looks like the result only depends on the first operator: if it's one, the result is 1, on the other hand if it's 0, it returns 0.

So, it's easy to write the function with that in mind:

import math

number_of_squares = int(math.sqrt(end) - math.sqrt(init)) + 
                    int(is_perfect_square(init))

Thanks to @kojiro, we have this approach (having a similar idea), which is easier to read:

from math import sqrt, floor, ceil

number_of_squares = 1 + floor(sqrt(end)) - ceil(sqrt(init))

This is as close to _the right way_ as I think we can get assuming that we _have to iterate_, for some reason, to count squares. — kojiro, Jan 04 '15 at 03:42
I'm confused by your using `max` on the `int` of booleans. Couldn't you just do `or`? — kojiro, Jan 04 '15 at 04:24
For that matter, what's wrong with just `floor(sqrt(end)) - ceil(sqrt(init))`? — kojiro, Jan 04 '15 at 04:51
@kojiro The floor and ceil approach does not work if the numbers are both perfect squares (I tried using 1,100), also if none of the numbers are prime (like 2,99) it does not work. — avenet, Jan 04 '15 at 11:34
@avenet You're right, but you still don't need the `is_perfect_square`. I was just off-by-one. Consider `1 + floor(sqrt(end)) - ceil(sqrt(init))`. It works whether the bounds are perfect squares or not, because `ceil(n) == floor(n)` iff `n` is a whole number. — kojiro, Jan 04 '15 at 15:10
@kojiro You are right (on the previous time we needed to sum 1 to it), I encourage you to create an answer with your formula. I'd be happy to vote it up. If you don't want I will edit this answer and credit it to you. Thank you for sharing your approach, it's simpler than mine!! — avenet, Jan 04 '15 at 15:18
Feel free to use my comment in your answer. It's all Creative Commons anyway. :) — kojiro, Jan 04 '15 at 15:24

Iterating over a large list

2 Answers2