Very slow iteration over numpy array using contains

Question

I have an Nx3 numpy array listing the 3 vertices for N triangular faces of a surface, and an Nx1 array of values corresponding to each of these faces.

I want to convert (as best as possible) these "faces" values into "vertex" values, for example by finding the mean value for all the faces that a vertex is associated with.

My current solution is fine for small values of N, but scales as the no. of faces x no. of vertices, which gets impractical very quickly:

def face2vertVal(faces, facesVals, verts):
    # INPUT:
    # faces:     Nx3 array of N vertex IDs for N faces
    # facesVals: Nx1 array of some parameter for each face

    # OUTPUT:
    # vertsVals: Px1 array of the mean value in "facesVals" that corresponds
    #            to each vertex

    import numpy as np

    vertsVals = np.zeros(faces.max()+1)

    for vertex in range(0, faces.max()+1):       
        tmpVals = np.zeros(1)
        for row in range(0, faces.shape[0]):
                if faces[row].__contains__(vertex):
                    tmpVals = np.append(tmpVals, facesVals[row])
        vertsVals[vertex] = tmpVals[1:].mean()
        del tmpVals

    return vertsVals

Thanks in advance.

EDIT: Vectorised approach is very fast, but requires too much memory for 700k faces and 350k vertices.

`__contains__` is not generally supposed to be called directly, instead you can do `if vertex in faces[row]:`... But anyway, the problem is that you are using a loop-based solution instead of a vectorized one (see answer). — jdehesa, Jun 05 '18 at 10:19
If you have to iterate an array, convert it to a normal Python list first. Iterating an array is very slow. Perhaps amend the question to allow the possibility of a vectorized approach with a memory limitation (compared to the one already posted). — kabanus, Jun 05 '18 at 12:16

jdehesa · Answer 1 · 2018-06-05T12:24:45.627

EDIT:

If memory is an issue, you can also have a "batched" version of the algorithm, which only processes a limited number of faces at a time:

import numpy as np

def face2vertVal(faces, faces_values, batch_size=None):
    batch_size = batch_size or len(faces)
    faces = np.asarray(faces)
    faces_values = np.asarray(faces_values)
    vert_idx = np.arange(faces.max() + 1)
    vertex_values = np.zeros(len(vert_idx))
    vertex_counts = np.zeros(len(vert_idx))
    for i in range(0, len(faces), batch_size):
        faces_batch = faces[i:i + batch_size]
        faces_values_batch = faces_values[i:i + batch_size]
        vertex_faces = np.any(faces_batch == vert_idx[:, np.newaxis, np.newaxis], axis=-1)
        vertex_values += np.sum(vertex_faces * faces_values_batch, axis=1)
        vertex_counts += np.count_nonzero(vertex_faces, axis=1)
    return vertex_values / np.maximum(vertex_counts, 0)

# All at once
vertex_values = face2vertVal([[0, 1, 2], [1, 2, 3], [2, 3, 0]], [1, 2, 3])
print(*vertex_values, sep=', ')
# 2.0, 1.5, 2.0, 2.5

# In batches of two
vertex_values_1b = face2vertVal([[0, 1, 2], [1, 2, 3], [2, 3, 0]], [1, 2, 3], 2)
print(*vertex_values_1b, sep=', ')
# 2.0, 1.5, 2.0, 2.5

You can manually pick batch_size (or use some formula depending on the memory you want to use or something) to balance the trade-off between speed and memory.

You should do that computation in a vectorized way, otherwise it will be much slower. This is one way:

import numpy as np

def face2vertVal(faces, faces_values):
    faces = np.asarray(faces)
    faces_values = np.asarray(faces_values)
    vert_idx = np.arange(faces.max() + 1)
    vertex_faces = np.any(faces == vert_idx[:, np.newaxis, np.newaxis], axis=-1)
    vertex_values = np.sum(vertex_faces * faces_values, axis=1) / np.maximum(np.count_nonzero(vertex_faces, axis=1), 0)
    return vertex_values

vertex_values = face2vertVal([[0, 1, 2], [1, 2, 3], [2, 3, 0]], [1, 2, 3])
print(*vertex_values, sep=', ')
# 2.0, 1.5, 2.0, 2.5

Note that the drawback to this approach it that, given N faces with P vertices, it requires O(N*P) memory, whereas the non-vectorized algorithm works with O(max(N, P)) memory.

Thanks! Works incredibly quickly, but unfortunately requires too much memory for my real-world use (700k faces and 350k vertices). — adamltyson, Jun 05 '18 at 12:13
@adamltyson I have added a batched version of the algorithm where you can tune the amount of memory used at once. — jdehesa, Jun 05 '18 at 12:25

score 2 · Accepted Answer · answered Jun 05 '18 at 16:05

Your code has a few problems which have a high impact on performance (eg. don't use np.append in a loop). So the first step would be to improve the code a bit and also to avoid unecessary searching.

In the next step, we can use a jit compiler to gain additional performance.

Code

import numpy as np
import numba as nb
import time
from scipy import spatial
N_verts=350000

#This gernerates a mesh for performance Testing
#https://stackoverflow.com/a/50579387/4045774
def make(N):
    verts= np.random.uniform(-10, 10, (N, 3))
    faces = spatial.Delaunay(verts[:, :2]).simplices
    return verts,faces

@nb.njit()
def face2vertVal(faces, facesVals):
  # INPUT:
  # faces:     Nx3 array of N vertex IDs for N faces
  # facesVals: Nx1 array of some parameter for each face

  # OUTPUT:
  # vertsVals: Px1 array of the mean value in "facesVals" that corresponds
  #            to each vertex

  vertsVals = np.zeros(faces.max()+1)
  verts_counts= np.zeros(faces.max()+1)

  for i in range(faces.shape[0]):
    for j in range(faces.shape[1]):
      vertsVals[faces[i,j]]+=facesVals[i]
      verts_counts[faces[i,j]]+= 1.

  vertsVals=vertsVals/verts_counts
  return vertsVals

[verts,faces]=make(N_verts)
facesVals=np.random.rand(faces.shape[0])
res=face2vertVal(faces, facesVals)

Performance

N_verts=350 000
Pure-Python:2.12s
With Numba (after the compilation which takes about 0.5s at the first call): 18.7ms

Thanks! Works quickly, and scales nicely to very large numbers of faces. — adamltyson, Jun 06 '18 at 11:13

Very slow iteration over numpy array using __contains__

2 Answers2

Very slow iteration over numpy array using contains