0

I have an array A[:,:], and an operation Op acting on the rows x=A[:,J] of A. Therefore, for each row x of A I obtain Op(x). If Op(x) is not a row of A, I append it to A. I do this until A is closed under Op (I am assuming that Op does not give rise to a neverending loop, i.e. Op is closed under a certain number of iterations). At the end of this process, given the extended A closed under Op, I also want the permutation Pindex such that Op(A[:,J])=A[:,Pindex(J)].

I have been able to write a Python code to do that:

import numpy as np

A=np.array([[0,2,3],
            [0,-3,-1],
            [0,4,3]])

def Op(x):
    return [0,-x[2],x[1]-x[2]]



A=A.tolist()

last=len(A)
Pindex=[]

for i,x in enumerate(A):
    found=False 
    xOp=Op(x)
    for j,y in enumerate(A):
        if np.array_equal(y,xOp):
            Pindex.append(j)
            found=True
            break
    if not found:
        A.append(xOp)
        Pindex.append(last)
        last+=1

A=np.asarray(A)      


print A      
print Pindex
print A[Pindex]

However, it does not seem to me very "pythonic". I guess it can be improved, to make it faster. Any suggestion?

P.S. This is part of a bigger code, where I need to use arrays. I needed to convert the array to a list because I needed to update the length of the object on which I am iterating. Maybe there is a smarter way to do that only with arrays.

P.P.S. I was not sure about the title of the question. I can change it if you have suggestions.

Gippo
  • 65
  • 5
  • Is there any point to using `np.array` here? Why not use lists through out? – hpaulj Apr 29 '20 at 17:18
  • Since this is a working code but the only issue is performance, I suggest posting it on https://codereview.stackexchange.com/ instead. If you do so, please, delete this copy of the question. – Georgy Apr 29 '20 at 17:21
  • @hpaulj That's because this is part of a bigger code, where I need to use arrays. I needed to convert the array to a list because I needed to update the length of the object on which I am iterating. Maybe there is a smarter way to do that only with arrays – Gippo Apr 29 '20 at 17:23
  • @Georgy. By now I can't post a new question. I will do that. As soon as I do it, I will delete this question, if it has not received any answer. – Gippo Apr 29 '20 at 17:31
  • @Gregory I think that this question is worth keeping around. It's not fundamentally different from all the other "how do I vectorize this" questions on this site. – Mad Physicist Apr 29 '20 at 18:17
  • I would not suggest posting to CR just for 'performance' reasons. "vectorization" questions like this are the bread-n-butter of `numpy` SO. – hpaulj Apr 29 '20 at 18:53
  • If your focus is on closure under operations, rather than just performance, you might look into creating a `ndarray` subclass. That isn't a trivial task, but sometimes is worth the work. Look at `np.matrix` for an example. – hpaulj Apr 29 '20 at 18:55
  • I've posted a solution that is probably not much more efficient than the original, but does look more vectorized on the surface. – Mad Physicist Apr 29 '20 at 19:42

1 Answers1

0

You can certainly vectorize the meat of your loop if you vectorize op (python convention is to use snake_case). For example, given your starting array,

A = np.array([[0, 2, 3],
              [0,-3,-1],
              [0, 4, 3]])

Start by defining op that works on the whole thing at once:

def op(x):
    return np.stack((x[:, 0], -x[:, 2], x[:, 1] - x[:, 2]), axis=1)

You can combine the arrays by first masking out the portions that are already duplicated in A. Using @Untubu's asvoid approach combined with in1d.

dt = np.dtype((np.void, A.dtype.itemsize * A.shape[-1]))

def asvoid(arr):
    arr = np.ascontiguousarray(arr)
    if np.issubdtype(arr.dtype, np.floating):
        arr += 0.
    return arr.view(dt).squeeze()

To build the array, your loop can be something like:

while True:
    B = op(A)
    mask = np.in1d(asvoid(B), asvoid(A), invert=True)
    if mask.any():
        A = np.concatenate((A, B[mask]), axis=0)
    else:
        break

When you are done, you will have two permutations of the rows: A and op(A). You can compute the sort index from one to the other using a technique like the one in https://stackoverflow.com/a/42232761/2988730:

oa = np.argsort(asvoid(A), axis=0)
ob = np.argsort(asvoid(B), axis=0)
ib = np.empty_like(oa)
ib[ob] = np.arange(B.shape[0])
J = oa[ib]

You can also build up J incrementally, since you know that each element of A at a given iteration already has a J, while the increment from the previous iteration (B), may or may not. If fact, you should only have to call op on an ever-shrinking B if closure is guaranteed. The only issue is that you would have to maintain A in sorted order (argsorted is fine), as you append elements of B to facilitate searching.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264