-1

I want to make a for loop for a calculation task, since the data is too large for one calculation and I always get new data I want to split my calculation process.

My array a has length n

I want to use the x first elements for the calculation (c=b*x) and than the next x elements of the array. so in total n/x times to calculate. At the end I want to concate/append all my c -arrays into one array ctotal.

so for an example:

a=np.random.rand(70000000)
ctotal=[]
x=7000
for i in range():
    c=model.predict(#each7000 elements of a)
    ctotal=ctotal.append(append with c)
    #calculate something with 20 first elements of a and return new 
    #array c, rand append to ctotal, repeat with new 20 elements  
newpyguy
  • 31
  • 9
  • 1. Not clear. 2. How large? I don't think it's going to be a problem for numpy... If you do a `for` loop to calculate smaller array and finally concatenate them at the end, you're still going to store the initial array + all the small arrays. Quite inefficient, while operating on the initial array should work just fine. – Mathieu Jun 26 '18 at 14:44
  • 1.I want to use x number of rows for a prediction task and than concate the solutions I2.large will be 10 mio rows and I want to split the array because I need a defined smaller size for my calculation (different reasons like using the inital array is taking a longer prediction time for my ANN and I later will use always a smaller part of the data). I hope this is a bit more clearly. – newpyguy Jun 26 '18 at 14:49
  • @newpyguy please provide the actual calculation you want to do with the actual data. As for now, doing 20000 multiplications does motivate the need to do your calculation in batch. – Olivier Melançon Jun 26 '18 at 14:56
  • @OlivierMelançon I am aware of that but the point is I just want to know how I can split the array and make a calculation. It will be a neural network which in this case only accepts a specific sequence length to use for and my only way for good results might be this. – newpyguy Jun 26 '18 at 14:58
  • @newpyguy then show us the neural network you want to implement. The solution will depend on the data and the solution for the current data is don't do batches. – Olivier Melançon Jun 26 '18 at 15:04
  • @I am going to implement a seq2seq encoder like here: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html but the point is the model accepts always the same sequence length for training and prediction, I want to use always a small part of the data to predict for. I think posting both together would confuse even more it is just how to split and array and do something – newpyguy Jun 26 '18 at 15:11

1 Answers1

1

Processing the array in chunks isn't "splitting the calculation process", because they will run sequentially. If you want to run multiple calculations at the same time, you should check out the threading library. Which, even if that isn't the point of your question, sounds like it might help you with 10 million elements(?) to get through and process.

If your question is just how to get 20 at a time, there is more than one way to do this. One is to create a generator:

def chunkify(input_list, chunk):
    start = 0
    while input_list[start:start+chunk]:
        yield input_list[start:start+chunk]
        start = start + chunk

Then, you can run for i in chunkify(your_list, 20), and on each loop, i will be the next 20 elements of your list.

MoxieBall
  • 1,907
  • 7
  • 23
  • Sounds like you should read more about what to _do_ than just read my answer about what to _say_ -- check out [this](https://stackoverflow.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks) stack overflow answer for more ways to do this, some of which are probably better than what I suggested. And do some reading about iterators and generators to get a better idea of how to apply them to your project. – MoxieBall Jun 26 '18 at 15:10
  • I will look over there too, thank you I just did not know the right wording. – newpyguy Jun 26 '18 at 15:14