0

So I've got a 2-dimensional array, say list:

list = [[x11, x12, x13, x14],
        [x21, x22, x23, x24],
       ...]

Some samples of list are:

# numbers in list are all integers
list = [[0, 17, 6, 10],
        [0, 7, 6, 10],
        ]
list = [[6, 50, 6, 10],
        [0, 50, 6, 10],
        ]
list = [[6, 16, 6, 10],
        [6, 6, 6, 10],
        ]
list = [[0, 50, 6, 10],
        [6, 50, 6, 10],
        [6, 40, 6, 10]
        ]
list = [[0, 27, 6, 10],
        [0, 37, 6, 10],
        ]

I need to iterate every two rows, for example [x11, x12, x13, x14] and [x21, x22, x23, x24], and do some complex comparisons:

cnt1 = cnt2 = cnt3 = cnt4 = cnt5 = 0
for i in range(0, length):
    for j in range(i + 1, length):
        if (list[i][0] + list[i][2] == list[j][0] or list[j][0] + list[j][2] == list[i][0]) and \
                list[i][1] == list[j][1]:
            cnt1 += 1
            if list[i][3] == list[j][3]:
                cnt2 += 1
            else
                cnt3 += 1
        elif (list[i][1] + list[i][3] == list[j][1] or list[j][1] + list[j][3] == list[i][1]) and \
                list[i][0] == list[j][0]:
            cnt4 += 1
            if list[i][2] == list[j][2]:
                cnt2 += 1
            else
                cnt3 += 1
        else
            cnt5 += 1
# do something with the counts

length here is usually small, but this nested loop runs thousands of times, so it takes very long to finish the program. I've read some tutorials of vectorizing in Numpy, but cannot figure out how to edit the code since the logic is kind of complex. Is there a way to optimize my code, even for a little bit? Any help would be highly appreciated. Thanks in advance!

L. Weng
  • 13
  • 3
  • Looks like `map()` would serve you well here – Xetrov Jan 12 '21 at 11:54
  • Does this answer your question? [how to compare 2 columns of a 2d array at a time with columns of another array in python](https://stackoverflow.com/questions/46867709/how-to-compare-2-columns-of-a-2d-array-at-a-time-with-columns-of-another-array-i) – Umair Mubeen Jan 12 '21 at 11:55
  • @UmairMubeen Thanks for replying. But mine is more like an efficiency issue instead of that one. – L. Weng Jan 12 '21 at 12:03
  • If it's vectorising that you are after, you would need to fill in all the `#do something`s for a solution. Depending on what the conditions are and what you are doing if those conditions are true native numpy vectorisation might/might not be possible. If you want to speed up your code, have you considered using Numba? – Ananda Jan 12 '21 at 12:04
  • @VortexYT Could you be a bit more specific? Thank you! – L. Weng Jan 12 '21 at 12:05
  • @Ananda Thank you. I've edited my question. Actually it's only a counting problem haha. I'm new to Python and Numpy so there's a lot to learn. And I'll try Numba out to see what happens. – L. Weng Jan 12 '21 at 12:12
  • Will the contents of the list remain unchanged for those "1000 of times" that you run this nested loop? If so, you're obviously repeating the same checks 1000s of times, and could instead, save the results of the checks and reuse them. – fountainhead Jan 12 '21 at 12:17
  • @fountainhead Thank you for replying. Unfortunately not. Actually `list` here is generated by an optimization algorithm so its content changes all the time. – L. Weng Jan 12 '21 at 12:19
  • @L.Weng, could you also please post a sample of `list` with some real numbers, this can be done and can be vectorised but I need some samples with work with – Ananda Jan 12 '21 at 12:26
  • @Ananda Please check the samples out. In my test the length of `list` is very small, but under other circumstances it might be bigger. – L. Weng Jan 12 '21 at 12:43
  • Awesome, I can work with this. – Ananda Jan 12 '21 at 14:24

2 Answers2

0

In your for loops you are comparing the array [x11, x12, x13, x14] with all following elements ([x21, x22, x23, x24], [x31, x32, x33, x34], [x41, x42, x43, x44], etc..),

then you move on to comparing [x21, x22, x23, x24] with all following elements ([x31, x32, x33, x34], [x41, x42, x43, x44], etc..).

To iterate every 2 rows and compare them 2 by 2 (that means x1 with x2, and then x3 with x4) you want something like this instead:

for i in range(0, length - 1, 2):
    j = i + 1;
    if (list[i][0] + list[i][2] == list[j][0] or list[j][0] + list[j][2] == list[i][0]) and list[i][1] == list[j][1]:
        # do something
        if list[i][3] == list[j][3]:
            # do something
        else
            # do something

note that you have also to address the case where the list array has odd size.

Syco
  • 799
  • 2
  • 14
  • 23
0

I am posting a solution for how to do this for the first if and the subsequent if and else conditions.

You can follow similar logic to do the same for the rest as well.

import numpy as np

arr = np.array([[0, 17, 6, 10],
       [0, 7, 6, 10],
       [6, 50, 6, 10],
       [0, 50, 6, 10],
       [6, 16, 6, 10],
       [6, 6, 6, 10],
       [0, 50, 6, 10],
       [6, 50, 6, 10],
       [6, 40, 6, 10],
       [0, 27, 6, 10],
       [0, 37, 6, 10]])

N = len(arr)

cnt1 = cnt2 = cnt3 = cnt4 = cnt5 = 0
for i in range(0, N):
    for j in range(i + 1, N):
        if (arr[i][0] + arr[i][2] == arr[j][0] or arr[j][0] + arr[j][2] == arr[i][0]) and \
                arr[i][1] == arr[j][1]:
            cnt1 += 1
            if arr[i][3] == arr[j][3]:
                cnt2 += 1
            else:
                cnt3 += 1
        elif (arr[i][1] + arr[i][3] == arr[j][1] or arr[j][1] + arr[j][3] == arr[i][1]) and \
                    arr[i][0] == arr[j][0]:
            cnt4 += 1
            if arr[i][2] == arr[j][2]:
                cnt2 += 1
            else:
                cnt3 += 1
        else:
            cnt5 += 1

# this corresponds to (arr[i][0] + arr[i][2] == arr[j][0] or arr[j][0] + arr[j][2] == arr[i][0])
cnt1_bool_c1 = ((arr[:, 0] + arr[:, 2])[:, None] == arr[:, 0][None, :])

# arr[i][1] == arr[j][1]:
cnt1_bool_c2 = arr[:, 1][:, None] == arr[:, 1][None, :]

# So that i and j are compared only if i != j
cnt1_bool_c2[np.arange(N), np.arange(N)] = False

# doing and of the two previous conditions finishing the very first if condition
cnt1_bool = np.bitwise_and(cnt1_bool_c1, cnt1_bool_c2)

# corresponds to cnt1
cnt1_n = cnt1_bool.sum()

# verified
print(cnt1 == cnt1_n)

# corresponds to arr[i][3] == arr[j][3]
cnt2_bool_c = arr[:, 3][:, None] == arr[:, 3][None, :]

# So that i and j are compared only if i != j
cnt2_bool_c[np.arange(N), np.arange(N)] = False

# correspond to the inner if, count only if these elemets share the same position as the previous elements
cnt2_n1 = np.bitwise_and(cnt1_bool, cnt2_bool_c).sum()  # corresponds to the cnt2 += 1 in the first inner condition

# correspond to the inner else, count only if these elemets do not share the same position as the previous elements
cnt3_n1 = np.bitwise_and(cnt1_bool, ~cnt2_bool_c).sum()  # corresponds to the cnt3 += 1 in the first inner else condition
Ananda
  • 2,925
  • 5
  • 22
  • 45
  • Thank you very much! I'll check it out now! – L. Weng Jan 12 '21 at 15:37
  • Hi! Following your logic, I also vectorized another loop. Unfortunately it doesn't go well but I couldn't figure out why. `!(arr[i][0] + arr[i][2] <= arr[j][0] or arr[j][0] + arr[j][2] <= arr[i][0] or arr[i][1] + arr[i][3] <= arr[j][1] or arr[j][1] + arr[j][3] <= arr[i][1]) && i != j`. My vectorized form is: `bool_c1 = (arr[:, 0] + arr[:, 2])[:, None] <= arr[:, 0][None, :] bool_c2 = (arr[:, 1] + arr[:, 3])[:, None] <= arr[:, 1][None, :] bool_c12 = ~np.bitwise_or(bool_c1, bool_c2) bool_c12[np.arange(N), np.arange(N)] = False cnt = bool_c12.sum()` Could you please take a look at this? Thanks! – L. Weng Jan 13 '21 at 07:04
  • What problem exactly did you encounter? This problem is very time consuming tbh for me to solve the entire thing, and also that's not really what SO is meant for. But if you are really unable to solve it, and have some specific issue regarding the implementation, maybe you can open it as another question *with that specific issue* – Ananda Jan 13 '21 at 13:56
  • Oh sorry, I just felt that one was very similar to the previous one and thought maybe you could quickly find out where I was wrong. But I found another way to solve it later. Thank you again for your time and patience, have a nice day! – L. Weng Jan 14 '21 at 05:23