Fill a multidimensional array efficiently that have many if else statements

Question

I want to fill an 4dim numpy array in a specific and efficient way. Because I don't know better I startet to write the code with if else statements, but that doesn't look nice, is probably slow and I also can not be really sure if I thought about every combination. Here is the code which I stopped writing down:

sercnew2 = numpy.zeros((gn, gn, gn, gn))
for x1 in range(gn):
    for x2 in range(gn):
        for x3 in range(gn):
            for x4 in range(gn):
                if x1 == x2 == x3 == x4: 
                    sercnew2[x1, x2, x3, x4] = ewp[x1]
                elif x1 == x2 == x3 != x4:
                    sercnew2[x1, x2, x3, x4] = ewp[x1] * ewp[x4]
                elif x1 == x2 == x4 != x3:
                    sercnew2[x1, x2, x3, x4] = ewp[x1] * ewp[x3]
                elif x1 == x3 == x4 != x2:
                    sercnew2[x1, x2, x3, x4] = ewp[x1] * ewp[x2]
                elif x2 == x3 == x4 != x1:
                    sercnew2[x1, x2, x3, x4] = ewp[x2] * ewp[x1]
                elif x1 == x2 != x3 == x4:
                    sercnew2[x1, x2, x3, x4] = ewp[x1] * ewp[x3]
                elif ... many more combinations which have to be considered

So basically what should happen is, that if all variables (x1, x2, x3, x4) are different from each other, the entry would be:

sercnew2[x1, x2, x3, x4] = ewp[x1]* ewp[x2] * ewp[x3] * ewp[x4]

Now if lets say the variable x2 and x4 is the same then:

sercnew2[x1, x2, x3, x4] = ewp[x1]* ewp[x2] * ewp[x3]

Others examples can be seen in the code above. Basically if two or more variables are the same, then I only consider on of them. I hope the pattern is clear. Otherwise please let me note and I will try to express my problem better. I am pretty sure, that there is a much more intelligent way to do it. Hope you know better and thanks in advance :)

No, the code above is not complete. There are many more combinations which has to be considered. So the pattern which I tried to describe is not clear? (I actually need this array to solve my previous problem which I also posted on stackoverflow) — HighwayJohn, Oct 15 '16 at 19:03
Could you also have two inequalities, like `if x1 != x2 != x3 == x4`? — Divakar, Oct 15 '16 at 19:11
Yes sure. In your example it is not clear if x1 is equal to x3 and x4 or is it? If x1 is equal to x3 and x4 and only x2 is different it would be: `sercnew2[x1, x2, x3, x4] = ewp[x1]* ewp[x2]`. If x1 is not equal to x3 and x4. It would be: ` sercnew2[x1, x2, x3, x4] = ewp[x1]* ewp[x2]*ewp[x3]` — HighwayJohn, Oct 15 '16 at 19:13

Nf4r · Answer 1 · 2016-10-15T19:39:41.780

3

I don't really know what you mean by saying that those variables are the same, but if indeed they are, than all you have to do is just use set().

from functools import reduce
from operator import mul
sercnew2 = numpy.zeros((gn, gn, gn, gn))
for x1 in range(gn):
    for x2 in range(x1, gn):
        for x3 in range(x2, gn):
            for x4 in range(x3, gn):
                set_ = [ewp[n] for n in set([x1, x2, x3, x4])]
                sercnew2[x1, x2, x3, x4] = reduce(mul, set_, 1)

The way it works is that it creates a set() which delete duplicates, and later with reduce function I pick first number from the set_, multiplies it with 1 (the initializer value), and the result of this is going to be passed to reduce as first argument, and second will be the second item from set_. Sorry for my bad explanation.

edited Oct 15 '16 at 19:39

answered Oct 15 '16 at 19:06

Nf4r

1,390
1
7
9

By variables I mean the x1, x2, x3 and x4 – HighwayJohn Oct 15 '16 at 19:08
How is the array ewp incorporated here? – HighwayJohn Oct 15 '16 at 19:10
I've added some explanation. Sorry for bad english. You can check how `reduce` work here https://docs.python.org/2/library/functions.html – Nf4r Oct 15 '16 at 19:14
Also you can consider limiting each `for` loop to start from `x1/x2` etc. Ie: `for x1 in range(gn) for x2 in range(x1, gn) for x3 in range(x2, gn)` – Nf4r Oct 15 '16 at 19:21
I try to understand your solution but I still dont see, how the array ewp is used? – HighwayJohn Oct 15 '16 at 19:37
Sorry, that was my bad. I've edited my answer and changed the `set_` to be the proper list of values from `ewp`. – Nf4r Oct 15 '16 at 19:40
Your code does give me results which are different from the other two approaches although I dont know why at the moment – HighwayJohn Oct 15 '16 at 23:51

Divakar · Accepted Answer · 2016-10-15T20:07:00.957

3

Really hoping that I have got it! Here's a vectorized approach -

from itertools import product

n_dims = 4 # Number of dims

# Create 2D array of all possible combinations of X's as rows
idx = np.sort(np.array(list(product(np.arange(gn), repeat=n_dims))),axis=1)

# Get all X's indexed values from ewp array
vals = ewp[idx]

# Set the duplicates along each row as 1s. With the np.prod coming up next,
#these 1s would not affect the result, which is the expected pattern here.
vals[:,1:][idx[:,1:] == idx[:,:-1]] = 1

# Perform product along each row and reshape into multi-dim array
out = vals.prod(1).reshape([gn]*n_dims)

edited Oct 15 '16 at 20:07

answered Oct 15 '16 at 19:37

Divakar

218,885
19
262
358

Okay, this actually looks really promising and advanced. Could you explain a bit what is happening there? I just implemented it in my code and the curve looks somehow realistic although I cannot be sure if it is entirely correct – HighwayJohn Oct 15 '16 at 19:44
@HighwayJohn Let me give you the essence of it - We get rid of the repeats with those `1s` (second last step assigning) as with `np.prod` along an axis with `1` doesn't affect the results. Those repeats correspond to some `x's` being equal. Second line gives us all those x's sorted, so if there are any duplicates, they would occur consecutively, which we are catching at the next step with `idx[:,1:] == idx[:,:-1]` masking. – Divakar Oct 15 '16 at 19:46
@HighwayJohn Also, if this works, we can try to solve your previous problem using this. So, let me know. – Divakar Oct 15 '16 at 19:51
There are [some faster alternatives](http://stackoverflow.com/a/11146645/190597) to `np.array(list(product(np.arange(gn), repeat=n_dims))`. Senderle's `cartesian_product([list(range(10))]*4)` for example, is ~40x faster than `array(list(product(...)))`. – unutbu Oct 15 '16 at 20:06
@unutbu Thanks for that pointer! Looks promising indeed when working with even decent numbered `gn` here. OP should take note of it. – Divakar Oct 15 '16 at 20:29
Okay thanks guys! I have to leave my computer for now, but will again take later a look on your comments. I am certain that your solution it correct Divakar, but I want to take a closer look again. I really need to understand it. – HighwayJohn Oct 15 '16 at 20:33
So, already got back again and now I do understand your code and I am convinced that it does to the correct calculation. Also what @unutbu mentioned is correct. That really speeds up the code. Thanks a lot!! I will also edit now my previous post which is related to this problem. I would like to know what you think about it – HighwayJohn Oct 15 '16 at 23:34

score 2 · Answer 3 · edited May 23 '17 at 12:33

You can do it in a single for loop too. Building on Divakar's Trick for a list of indexes, the first thing we need to do is figure out how to extract only the unique indices of a given element in the 4d array sercnew2.

One of the fastest ways to do this (Refer: https://www.peterbe.com/plog/uniqifiers-benchmark) is using sets. Then, we simply have to initialize sercnew2 as an array of ones, rather than zeros.

from itertools import product
import numpy as np

sercnew2 = np.ones((gn, gn, gn, gn))
n_dims=4
idx = list(product(np.arange(gn), repeat=n_dims))

for i,j,k,l in idx:
    unique_items = set((i,j,k,l))
    for ele in unique_items:
        sercnew2[i,j,k,l] *= ewp[ele]

EDIT: As @unutbu suggested, we could also use the cartesian_product function from https://stackoverflow.com/a/11146645/5714445 to speed up the initialization of idx

Edit2: In case you are having difficulty understanding what product from itertools does, it provides all permutations. For instance, suppose gn=2, with repeat dimension set to 4, you get

[0, 0, 0, 0]
[0, 0, 0, 1]
[0, 0, 1, 0]
[0, 0, 1, 1]
[0, 1, 0, 0]
[0, 1, 0, 1]
[0, 1, 1, 0]
[0, 1, 1, 1]
[1, 0, 0, 0]
[1, 0, 0, 1]
[1, 0, 1, 0]
[1, 0, 1, 1]
[1, 1, 0, 0]
[1, 1, 0, 1]
[1, 1, 1, 0]
[1, 1, 1, 1]

Okay, this kinda look promesing, but only situations are consideres in which all variabels( x1,x2,x3,x4) are the same or in which only one is different. What about if 2 are different and all are different? I guess then only your solution does get complicated? — HighwayJohn, Oct 15 '16 at 19:34
@HighwayJohn Modified my answer using Divakar's trick. Hope it helps :-) — R. S. Nikhil Krishna, Oct 15 '16 at 20:05
Thanks a lot :) I have to leave my computer for now but will take a closer look at your approach later — HighwayJohn, Oct 15 '16 at 20:36
Already got back. Your solution is also working :) It is just a little bit slower then Divakars approach. But stil,l thanks a lot for your help! — HighwayJohn, Oct 15 '16 at 23:37

Fill a multidimensional array efficiently that have many if else statements

3 Answers3

Linked