1

I have a function with outputs 2 arrays, lets call them X_i, and Y_i which are two N x 1 arrays, where N is the number of points. By using multiprocessing's pool.apply_aync, I was able to parallelize this function which gave me my results in a HUGE list. The structure of the results are a list of M values, where each value is a list containing X_i and Y_i. So in summary, I have a huge list which M smaller lists containing the two arrays X_i and Y_i.

Now I have want append all the X_i's into one array called X and Y_i's called Y. What is the most efficient way to do this? I'm looking for some sort of parallel algorithm. Order does NOT matter!

So far I have just a simple for loop that separates this massive data array:

X = np.zeros((N,1))
Y = np.zeros((N,1))
for i in range(len(results))
    X = np.append(results[i][0].reshape(N,1),X,axis = 1)
    Y = np.append(results[i][1].reshape(N,1),Y,axis = 1)

I found this algorthim to be rather slow, so I need to speed it up! Thanks!

patrick7
  • 366
  • 1
  • 11
  • 2
    are the lengths of each results[i][0] and results[i][1] constant for all values of i? – JonSG May 26 '21 at 21:25
  • could you give us an input and output example? – Aru May 26 '21 at 21:30
  • @JonSG i is the ith list of the big list consisting of M smaller lists ( len(results) = M ). Each ith list consists of 2 arrays, X_i, and Y_i, which are all of length N. – patrick7 May 26 '21 at 21:32
  • @Aru np.shape(results) = (256, 2, 720001). Which means, in this list ,results, there are 256 smaller lists containing 2 arrays of length 720001. These two arrays are X_i and Y_i. I'm not quite sure how to nicely show you an example but basically, the list, results, contains 256 nested lists which each of those lists contain 2 arrays that I'm trying to separate and sort out. – patrick7 May 26 '21 at 21:36
  • results[0][0] gives X_i which is a N x 1 array. results[0][1] gives Y_i which is also N x 1 array. However results[1][0] gives X_i but it is a different X_i (e.g. out put of the same function using a different set of initial conditions). So basiscally I want to stack all my X_i in one array called X - which should be of dimensions N x M. – patrick7 May 26 '21 at 21:46
  • @patrick7 i think you should divide it further in to subproblems, you should go with one format, all as lists, or all as arrays in my opinion. maybe this one could help you then with the list handling https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-a-list-of-lists/952952#952952 – Aru May 26 '21 at 21:47
  • @JonSG Sorry, maybe I'm not describing it well, but not quite. results[0][0] is a N x 1 array. In other words, results[0] returns a list which contains [X_i, Y_i], and so does reults[1] ... results[M]. The X_i and Y_i contains in each list contains different values but the same dimensions. So I want to append all the X_i into 1 array, and Y_i into 1 array. I hope this makes more sense. :) – patrick7 May 26 '21 at 21:54
  • so you want `X[0] = M[0][0]` and `X[720001] = M[1][0]` and `X[720001*k] = M[k][0]` and similarly for `Y` with `M[k][1]`? @patrick7 Sorry I keep editing my question. – JonSG May 26 '21 at 21:54
  • 1
    Thats ok! Close! I want my X and Y to be 2 dimensional, so X[0][0] = M[0][0][0] (Since M[0][0] only gives the N x 1 array X_i]. So basically X[i][j] = M[i][0][j]. This is actually quite insightful, I haven't thought about it like this! – patrick7 May 26 '21 at 21:57
  • 1
    With this insight do you see a way to parallelize this now? :-) – JonSG May 26 '21 at 21:59
  • @JonSG Yes! Although, not a parallel way. I will post it in the answers. I hope you will like my solutions. Thanks for the insight and help! – patrick7 May 26 '21 at 22:06

2 Answers2

1

You should provide a simple scenario of your problem, break it down and give us a simple input, output scenario, it would help a lot, as all this variables and text make it a bit confusing.Maybe this can help; You can unpack the lists, then grab the ones you need by index, append the list to your new empty X[] and append the other list you needed to Y[], at the end get the arrays out of the lists and merge those into your new N dimensional array or into a new list.

list = [[[1,2],[3,4]],[[4,5],[6,7]]]
sub_pre = []
flat_list = []
for sublist in list:
    sub_pre.append(sublist)
    for item in sublist:
        flat_list.append(item)
print(list)
print(flat_list)
Aru
  • 352
  • 1
  • 11
  • This works well, and is definitely an alternative method. However for larger data sets, this does not provide much of a speed improvement compared to the original for loop. – patrick7 May 26 '21 at 22:22
  • the question of efficiency is another one, i just wanted to help you with breaking down the data. to improve efficiency you can use other structures. – Aru May 26 '21 at 22:24
1

Thanks to @JonSG for the brilliant insight. This type of sorting algorithm can be sped up using array manipulation. Through the use of most parallels packages, a function that outputs in multiple arrays will most likely get put into a huge list. Here I have a list called results, which contains M smaller lists of two N x 1 arrays.

To unpack the main array and sort all the X_i and Y_i into their own X and Y arrays respectively, it can be done so like this.

np.shape(results) = (M, 2, N)
X = np.array(results)[:,0,:]
Y = np.array(results)[:,1,:]

This gave me an 100x speed increase!

patrick7
  • 366
  • 1
  • 11
  • Nicely done :-) You might think about marking this as the answer since it seems to have helped and there is nothing wrong with answering your own question. – JonSG May 27 '21 at 01:06
  • @JonSG I will once it lets me. I have to wait 2 days! – patrick7 May 27 '21 at 01:33