1

In the geosciensces while porting code from Fortran to python I see variations of these nested for loops(sometimes double nested and sometimes triple nested) that I would like to vectorize(shown here as an minimum reproducible example)

import numpy as np
import sys
import math
def main():
    t = np.arange(0,300)
    n1=7
    tc = test(n1,t)

def test(n1,t):
    n2 = int(2*t.size/(n1+1))
    print(n2)
    tChunked = np.zeros(shape = (n1,n2))
    for i in range(0,n1):
        istart = int(i*n2/2)
        for j in range(0,n2):
            tChunked[i,j] = t[istart+j]



  return  tChunked

main()

What have I tried ?

I have gotten as far as elminating the istart and getting j and using outer addition to get istart+j. But how do I use the index k to get a 2d tChunked array in a single line is where I am stuck.

istart = np.linspace(0,math.ceil(n1*n2/2),num=n1,endpoint=False,dtype=np.int32)
jstart = np.linspace(0,n2,num=n2,endpoint=False,dtype=np.int32)

k = jstart[:,np.newaxis]+istart
gansub
  • 1,164
  • 3
  • 20
  • 47
  • Give us a quick picture of how that `istart` part changes the inner loop from a straight forward, vectorizable, one. – hpaulj Jun 03 '22 at 04:15

3 Answers3

2

numpy will output a 2D array if the index is 2D. So you simply do this.

def test2(n1, t):
    n2 = int(2 * t.size / (n1 + 1))
    istart = np.linspace(0, math.ceil(n1 * n2 / 2), num=n1, endpoint=False, dtype=np.int32)
    jstart = np.linspace(0, n2, num=n2, endpoint=False, dtype=np.int32)
    k = istart[:, np.newaxis] + jstart  # Note: I switched i and j.

    tChunked = t[k]  # This creates an array of the same shape as k.

    return tChunked
ken
  • 1,543
  • 1
  • 2
  • 14
1

If you have to deal with a lot of nested loops, maybe the solution is to use numba as it can result in better performance than native numpy. Specially for non-python functions like the one you showed.

As easy as:

from numba import njit

@njit
def test(n1,t):
    n2 = int(2*t.size/(n1+1))
    print(n2)
    tChunked = np.zeros(shape = (n1,n2))
    for i in range(0,n1):
        istart = int(i*n2/2)
        for j in range(0,n2):
            tChunked[i,j] = t[istart+j]
Zaero Divide
  • 699
  • 2
  • 10
1

What made me fall head-over-heels in love with Python was actually NumPy and specifically its amazing indexing and indexing routines!

In test_extra_crispy() we can use zip() to get our ducks (initial conditions) in a row, then indexing using offsets to do the "transplanting" of blocks of values:

i_values = np.arange(7)
istarts = (i_values * n2 / 2).astype(int)
for i, istart in zip(i_values, istarts):
    tChunked[i, :n2] = t[istart:istart+n2]

See also

We can see that for

t = np.arange(10000000)
n1 = 7

"extra crispy" is a lot faster than the original (91 vs 4246 ms), but only a little faster than test2() from ken's answer which is not significant considering that it does more careful checking than my brute force treatment.

However "extra crispy" avoids adding an additional axis and expanding the memory usage. For modest arrays this doesn't matter at all. But if you have a huge array then disk space and access time becomes a real problem.

"Just use Jit"?

Zaero Divide's answer points out that @jit goes a long way towards converting impractical python nested loops to fast compiled do loops.

But always check out what methods are already available in NumPy which are already running efficiently with compiled code. One can easily get a gigaflop on a laptop for problems amenable to NumPy's many handy methods.

In my case test_jit() ran four times slower than "extra crispy" (350 vs 91 ms)!

You shouldn't write script that depends on @jit to run fast if you don't need to; not everyone wants to "just use @jit".

Strangely shaped indexing

If you need to address a more random-shaped volume within an array, you can use indexing like this:

array = np.array([[0, 0, 1, 0, 0], [0, 1, 0, 1, 0], [1, 0, 0, 0, 1], [0, 1, 0, 1, 0], [0, 0, 1, 0, 0]])
print(array)

gives

[[0 0 1 0 0]
 [0 1 0 1 0]
 [1 0 0 0 1]
 [0 1 0 1 0]
 [0 0 1 0 0]]

and we can get indices for the 1's like this:

i, j =  np.where(array == 1)
print(i)
print(j)

which gives

If we want to start with a zeroed array and insert those 1's via numpy indexing, just do this

array = np.zeros((5, 5), dtype=int)
array[i, j] = 1

which recreates the original array.


import numpy as np
import matplotlib.pyplot as plt
import time

def test_original(n1, t):
    n2 = int(2*t.size / (n1 + 1))
    tChunked = np.zeros(shape = (n1, n2))
    for i in range(n1):
        istart = int(i * n2 / 2)
        for j in range(0, n2):
            tChunked[i, j] = t[istart + j]
    return  tChunked

t = np.arange(10000000)
n1 = 7

t_start = time.process_time()

tc_original = test_original(n1, t)

print('original process time (ms)', round(1000*(time.process_time() - t_start), 3))
# print('tc_original.shape: ', tc_original.shape)

fig, ax = plt.subplots(1, 1)
for thing in tc_original:
    ax.plot(thing)
plt.show()

def test_extra_crispy(n1, t):
    n2 = int(2*t.size / (n1 + 1))
    tChunked = np.zeros(shape = (n1, n2))
    i_values = np.arange(7)
    istarts = (i_values * n2 / 2).astype(int)
    for i, istart in zip(i_values, istarts):
        tChunked[i, :n2] = t[istart:istart+n2]
    return  tChunked


t_start = time.process_time()

tc_extra_crispy = test_extra_crispy(n1, t)

print('extra crispy process time (ms)', round(1000*(time.process_time() - t_start), 3))
# print('tc_extra_crispy.shape: ', tc_extra_crispy.shape)

print('np.all(tc_extra_crispy == tc_original): ', np.all(tc_extra_crispy == tc_original))

import math

def test2(n1, t): # https://stackoverflow.com/a/72492815/3904031
    n2 = int(2 * t.size / (n1 + 1))
    istart = np.linspace(0, math.ceil(n1 * n2 / 2), num=n1, endpoint=False, dtype=np.int32)
    jstart = np.linspace(0, n2, num=n2, endpoint=False, dtype=np.int32)
    k = istart[:, np.newaxis] + jstart  # Note: I switched i and j.

    tChunked = t[k]  # This creates an array of the same shape as k.

    return tChunked

t_start = time.process_time()

tc_test2 = test2(n1, t)

print('test2 process time (ms)', round(1000*(time.process_time() - t_start), 3))
# print('tc_test2.shape: ', tc_test2.shape)

print('np.all(tc_test2 == tc_original): ', np.all(tc_test2 == tc_original))

# from https://stackoverflow.com/a/72494777/3904031

from numba import njit

@njit
def test_jit(n1,t):
    n2 = int(2*t.size/(n1+1))
    print(n2)
    tChunked = np.zeros(shape = (n1,n2))
    for i in range(0,n1):
        istart = int(i*n2/2)
        for j in range(0,n2):
            tChunked[i,j] = t[istart+j]
    return tChunked


t_start = time.process_time()

tc_jit = test_jit(n1, t)

print('jit process time (ms)', round(1000*(time.process_time() - t_start), 3))
# print('tc_jit.shape: ', tc_jit.shape)

print('np.all(tc_jit == tc_original): ', np.all(tc_jit == tc_original))
uhoh
  • 3,713
  • 6
  • 42
  • 95
  • 1
    And after years of me answering your questions on ESSE you answer one of mine on SO. Looks great and quote from your own words let me take your answer for a spin ! – gansub Nov 26 '22 at 06:40
  • 1
    These days I am also looking at C++ numpy aka xtensor to get some more speed. Couple that with Cython and invoke the result in Python. – gansub Nov 26 '22 at 06:43
  • 1
    @gansub It's a pleasure, and oh you are brave! I haven't written something in compiled code in quite a long time (cf. [punchcards](https://stackoverflow.com/q/41906104/3904031)) *Have fun!* – uhoh Nov 26 '22 at 07:03
  • @gansub Yikes, I see what you mean. Hmm... if you like Python-ish convenience but need compiled speed, [julia](https://julialang.org/) might be something to try. You can write your functions in julia and call them from Python cf. [Running Julia .jl file in python](https://stackoverflow.com/q/49750067/3904031) and [How to call a Python function from a Julia program?](https://stackoverflow.com/q/66648841/3904031) and [Call Julia function from a python file](https://stackoverflow.com/q/72338307/3904031) – uhoh Nov 26 '22 at 10:49