13

I'm trying to create a function that will calculate the lattice distance (number of horizontal and vertical steps) between elements in a multi-dimensional numpy array. For this I need to retrieve the actual numbers from the indexes of each element as I iterate through the array. I want to store those values as numbers that I can run through a distance formula.

For the example array A

 A=np.array([[1,2,3],[4,5,6],[7,8,9]])

I'd like to create a loop that iterates through each element and for the first element 1 it would retrieve a=0, b=0 since 1 is at A[0,0], then a=0, b=1 for element 2 as it is located at A[0,1], and so on...

My envisioned output is two numbers (corresponding to the two index values for that element) for each element in the array. So in the example above, it would be the two values that I am assigning to be a and b. I only will need to retrieve these two numbers within the loop (rather than save separately as another data object).

Any thoughts on how to do this would be greatly appreciated!

cs95
  • 379,657
  • 97
  • 704
  • 746
yogz123
  • 703
  • 3
  • 8
  • 25
  • As your description of the problem is somewhat hard to understand, can you give a sample expected output for your example? So far sounds like you want a list/array of all pairs of indexes. – DYZ Feb 07 '17 at 05:49
  • Thanks for the suggested clarification! Let me know if it's still not clear. – yogz123 Feb 07 '17 at 05:53
  • A list/array is fine but even more simply I just need to retrieve the two index values within each iteration as I will feed these into another formula immediately after retrieving them. – yogz123 Feb 07 '17 at 05:55
  • Related: for the 1-dimensional case, see [How to iterate 1d NumPy array with index and value](https://stackoverflow.com/q/49384682/9209546) – jpp Jan 10 '19 at 16:16

3 Answers3

17

As I've become more familiar with the numpy and pandas ecosystem, it's become clearer to me that iteration is usually outright wrong due to how slow it is in comparison, and writing to use a vectorized operation is best whenever possible. Though the style is not as obvious/Pythonic at first, I've (anecdotally) gained ridiculous speedups with vectorized operations; more than 1000x in a case of swapping out a form like some row iteration .apply(lambda)

@MSeifert's answer much better provides this and will be significantly more performant on a dataset of any real size

More general Answer by @cs95 covering and comparing alternatives to iteration in Pandas


Original Answer

You can iterate through the values in your array with numpy.ndenumerate to get the indices of the values in your array.

Using the documentation above:

A = np.array([[1,2,3],[4,5,6],[7,8,9]])
for index, values in np.ndenumerate(A):
    print(index, values)  # operate here
ti7
  • 16,375
  • 6
  • 40
  • 68
7

You can do it using np.ndenumerate but generally you don't need to iterate over an array.

You can simply create a meshgrid (or open grid) to get all indices at once and you can then process them (vectorized) much faster.

For example

>>> x, y = np.mgrid[slice(A.shape[0]), slice(A.shape[1])]
>>> x
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2]])
>>> y
array([[0, 1, 2],
       [0, 1, 2],
       [0, 1, 2]])

and these can be processed like any other array. So if your function that needs the indices can be vectorized you shouldn't do the manual loop!

For example to calculate the lattice distance for each point to a point say (2, 3):

>>> abs(x - 2) + abs(y - 3)
array([[5, 4, 3],
       [4, 3, 2],
       [3, 2, 1]])

For distances an ogrid would be faster. Just replace np.mgrid with np.ogrid:

>>> x, y = np.ogrid[slice(A.shape[0]), slice(A.shape[1])]
>>> np.hypot(x - 2, y - 3)  # cartesian distance this time! :-)
array([[ 3.60555128,  2.82842712,  2.23606798],
       [ 3.16227766,  2.23606798,  1.41421356],
       [ 3.        ,  2.        ,  1.        ]])
MSeifert
  • 145,886
  • 38
  • 333
  • 352
1

Another possible solution:

import numpy as np

A = np.array([[1,2,3],[4,5,6],[7,8,9]])
for _, val in np.ndenumerate(A):
    ind = np.argwhere(A == val)
    print(val, ind)

In this case you will obtain the array of indexes if value appears in array not once.

AsukaMinato
  • 1,017
  • 12
  • 21
Roman Fursenko
  • 688
  • 4
  • 9