0

Let's say I have some arrays/lists that contains a lot of values, which means that loading several of these into memory would ultimately result in a memory error due to lack of memory. One way to circumvent this is to load these arrays/lists into a generator, and then use them when needed. However, with generators you don't have so much control as with arrays/lists - and that is my problem.

Let me explain.

As an example I have the following code, which produces a generator with some small lists. So yeah, this is not memory intensive at all, just an example:

import numpy as np

np.random.seed(10)

number_of_lists = range(0, 5)

generator_list = (np.random.randint(0, 10, 10) for i in number_of_lists)

If I iterate over this list I get the following:

for i in generator_list:
    print(i)

>> [9 4 0 1 9 0 1 8 9 0]
>> [8 6 4 3 0 4 6 8 1 8]
>> [4 1 3 6 5 3 9 6 9 1]
>> [9 4 2 6 7 8 8 9 2 0]
>> [6 7 8 1 7 1 4 0 8 5]

What I would like to do is sum element wise for all the lists (axis = 0). So the above should in turn result in:

[36, 22, 17, 17, 28, 16, 28, 31, 29, 14]

To do this I could use the following:

sum = [0]*10
for i in generator_list:
    sum += i

where 10 is the length of one of the lists.

So far so good. I am not sure if there is a better/more optimized way of doing it, but it works.

My problem is that I would like to determine which lists in the generator_list I want to use. For example, what if I wanted to sum two of the first [0] list, one of the third, and 2 of the last, i.e.:

[9 4 0 1 9 0 1 8 9 0]
[9 4 0 1 9 0 1 8 9 0]
[4 1 3 6 5 3 9 6 9 1]
[6 7 8 1 7 1 4 0 8 5]
[6 7 8 1 7 1 4 0 8 5]

>> [34, 23, 19, 10, 35, 5, 19, 22, 43, 11]

How would I go about doing that ?

And before any questions arise why I want to do it this way, the reason is that in my real case, getting the arrays into the generator takes some time. I could then in principle just generate a new generator where I put in the order of lists as seen in the new list, but again, that would mean I would have to wait to get them in a new generator. And if this is to happen thousands of times (as seen with bootstrapping), well, it would take some time. With the first generator I have ALL lists that are available. Now I just wish to use them selectively so I don't have to create a new generator every time I want to mix it up, and sum a new set of arrays/lists.

Denver Dang
  • 2,433
  • 3
  • 38
  • 68
  • Is this what you're looking for? https://stackoverflow.com/questions/5509302/whats-the-best-way-of-skip-n-values-of-the-iteration-variable-in-python – Simon Sep 04 '18 at 20:22
  • Can you please clarify "what if I wanted to sum two of the first [0] list, one of the third, and 2 of the last, i.e.:", I don't see how you compute the result. – rocksportrocker Sep 04 '18 at 20:23
  • @rocksportrocker So, if it were a list, and not a generator, I would probably do something like: `list1 = [0, 0, 2, 4, 4]` `sum2 = [0]*10` `for x in list1:` `sum2 += generator_list[x]` where `list1` would be a list of the lists that needed to be summed. It's not pretty writing much code in the comments, sorry... – Denver Dang Sep 04 '18 at 20:43
  • @Simon Not quite I think. I would actually like, as just shown in the above comment, to create a list that holds the indexes of the lists in the generator (or list) and then sum. – Denver Dang Sep 04 '18 at 20:46

1 Answers1

1
import numpy as np
np.random.seed(10)

number_of_lists = range(5)

generator_list = (np.random.randint(0, 10, 10) for i in number_of_lists)

indices = [0, 0, 2, 4, 4]
assert sorted(indices) == indices, "only works for sorted list"

# sum_ = [0] * 10

# I prefer this:
sum_ = np.zeros((10,), dtype=int)

generator_index = -1

for index in indices:
    while generator_index < index:
        vector = next(generator_list)
        generator_index += 1
    sum_ += vector

print(sum_)

outputs

[34 23 19 10 37  5 19 22 43 11]
rocksportrocker
  • 7,251
  • 2
  • 31
  • 48
  • Hmmm, if I do this on my own lists I get the error: `TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('int32') with casting rule 'same_kind'`. Is that fixable, or...? – Denver Dang Sep 04 '18 at 21:28
  • Or if you just change the `generator_list` to `(np.random.rand(3, 3).ravel(order="C") for i in number_of_lists)` the same error will arise. – Denver Dang Sep 04 '18 at 21:44
  • Based on the initialisation of `sum_` the updates only work if you add integer vectors of length 10. You add `floats` of length 9 when you use `np.random.rand(3, 3).ravel()`. So adapt the size and dtype of `sum_` and the code will work. – rocksportrocker Sep 05 '18 at 06:25
  • Yes, I did figure that out. Thank you. – Denver Dang Sep 05 '18 at 12:07