10

I have a dataset which has 4 dimensions (for now...) and I need to iterate over it.

To access a value in the dataset, I do this:

value = dataset[i,j,k,l]

Now, I can get the shape for the dataset:

shape = [4,5,2,6]

The values in shape represent the length of the dimension.

How, given the number of dimensions, can I iterate over all the elements in my dataset? Here is an example:

for i in range(shape[0]):
    for j in range(shape[1]):
        for k in range(shape[2]):
            for l in range(shape[3]):
                print('BOOM')
                value = dataset[i,j,k,l]

In the future, the shape may change. So for example, shape may have 10 elements rather than the current 4.

Is there a nice and clean way to do this with Python 3?

MSeifert
  • 145,886
  • 38
  • 333
  • 352
pookie
  • 3,796
  • 6
  • 49
  • 105
  • I think a [recursive](http://www.python-course.eu/recursive_functions.php) solution will be the best solution for you. – Yonlif Aug 17 '17 at 14:36
  • 1
    Possible duplicate of [Iterating through a multidimensional array in Python](https://stackoverflow.com/questions/971678/iterating-through-a-multidimensional-array-in-python) – Vikash Singh Aug 17 '17 at 14:39

1 Answers1

14

You could use itertools.product to iterate over the cartesian product 1 of some values (in this case the indices):

import itertools
shape = [4,5,2,6]
for idx in itertools.product(*[range(s) for s in shape]):
    value = dataset[idx]
    print(idx, value)
    # i would be "idx[0]", j "idx[1]" and so on...

However if it's a numpy array you want to iterate over, it could be easier to use np.ndenumerate:

import numpy as np

arr = np.random.random([4,5,2,6])
for idx, value in np.ndenumerate(arr):
    print(idx, value)
    # i would be "idx[0]", j "idx[1]" and so on...

1 You asked for clarification what itertools.product(*[range(s) for s in shape]) actually does. So I'll explain it in more details.

For example is you have this loop:

for i in range(10):
    for j in range(8):
        # do whatever

This can also be written using product as:

for i, j in itertools.product(range(10), range(8)):
#                                        ^^^^^^^^---- the inner for loop
#                             ^^^^^^^^^-------------- the outer for loop
    # do whatever

That means product is just a handy way of reducing the number of independant for-loops.

If you want to convert a variable number of for-loops to a product you essentially need two steps:

# Create the "values" each for-loop iterates over
loopover = [range(s) for s in shape]

# Unpack the list using "*" operator because "product" needs them as 
# different positional arguments:
prod = itertools.product(*loopover)

for idx in prod:
     i_0, i_1, ..., i_n = idx   # index is a tuple that can be unpacked if you know the number of values.
                                # The "..." has to be replaced with the variables in real code!
     # do whatever

That's equivalent to:

for i_1 in range(shape[0]):
    for i_2 in range(shape[1]):
        ... # more loops
            for i_n in range(shape[n]):  # n is the length of the "shape" object
                # do whatever
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • You don't need to. Indexing using `[i, j, k, l]` is equivalent to `[(i, j, k, l)]` and `idx` is just `(i, j, k, l)` so you can just index it (as shown) with `dataset[idx]`. :) – MSeifert Aug 17 '17 at 15:03
  • if it's an ndarray and you want to iterate over it without having the index iterated as well, you can use ```ndarray.nditer``` instead of ```ndarray.ndenumerate``` – Gal Avineri Sep 28 '19 at 12:07
  • correction, these are functions of numpy and not of ndarray: [ndenumerate](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndenumerate.html), [nditer](https://docs.scipy.org/doc/numpy/reference/generated/numpy.nditer.html) – Gal Avineri Sep 28 '19 at 13:24