Accelerating FizzBuzz with numpy

Question

The FizzBuzz problem: Given a natural number N >= 0 print a sequence of numbers 0 - N replacing every number that is divisible by 3 with the word fizz, every number that is divisible by 5 with the word buzz, and every number divisible by both 3 and 5 with the word fizzbuzz.

Numpy isn't excellent for this (it doesn't offer much acceleration for strings), but I figured it shouldn't be too horrible and, perhaps, it can beat plain python if used wisely. To my surprise, however, the opposite is the case for a naive implementation. Why is this, and how does one improve upon this?

Here is the code that I used to generate the timings. It includes a pure-python reference implementation, a naive numpy implementation, and a numba.jit variant, because I think it can act as a reasonable lower bound on performance.

import numpy as np
import matplotlib.pyplot as plt
import numba as nb
import timeit


def classic_fizzbuzz(N: int) -> str:
    result = list()

    for idx in range(N):
        if idx % 3 == 0 and idx % 5 == 0:
            result.append("fizzbuzz")
        elif idx % 5 == 0:
            result.append("buzz")
        elif idx % 3 == 0:
            result.append("fizz")
        else:
            result.append(str(idx))

    return " ".join(result)


def numpy_fizzbuzz(N: int) -> str:
    integers = np.arange(N)
    result = np.arange(N).astype(str)
    result = np.where(integers % 3 == 0, "fizz", result)
    result = np.where(integers % 5 == 0, "buzz", result)
    result = np.where(integers % 15 == 0, "fizzbuzz", result)

    return " ".join(result)


@nb.jit(nopython=True)
def numba_fizzbuzz(N:int) -> str:
    result = list()

    for idx in range(N):
        if idx % 3 == 0 and idx % 5 == 0:
            result.append("fizzbuzz")
        elif idx % 5 == 0:
            result.append("buzz")
        elif idx % 3 == 0:
            result.append("fizz")
        else:
            result.append(str(idx))

    return " ".join(result)

# do not measure initial compile time
numba_fizzbuzz(20)

def time_fizzbuzz(fn):
    repeats = 100
    times = list()
    N_values = np.linspace(0, int(1e4), 100, dtype=int)
    for idx in N_values:
        N = int(idx)
        times.append(timeit.timeit(lambda: fn(N), number=repeats) / repeats)
    
    return N_values, times

fig, ax = plt.subplots(dpi=150)  # web-quality

contendants = {
    "classic": classic_fizzbuzz,
    "numpy": numpy_fizzbuzz,
    "numba": numba_fizzbuzz
}

for fn in contendants.values():
    ax.plot(*time_fizzbuzz(fn))

ax.set_ylabel("Average Time (s)")
ax.set_label("Input")
fig.suptitle(
    "Comparison of FizzBuzz implementations in Python.",
    # fontsize="medium",
)
ax.set_title("Timings for various input N (averaged over 100 runs each)", fontsize="small")

ax.legend(
    contendants.keys(),
    loc="center left",
    bbox_to_anchor=(1, 0.5),
    title="Contestants",
)

fig.tight_layout()

fig.savefig("timings.png")

André · Answer 1 · 2022-04-25T10:56:04.757

The reason why this happens is that you iterate all elements three times with the numpy approach. You have to check the whole array for each np.where call. The python implementation only iterates the array once. As you have shown, this is faster, although the individual steps of the iteration may be slower. That being said, you can (ab)use np.vectorize for this application, although it is explicitly not meant for the purpose of performance improvement. In that regard, this question is very related.

In your special case, you can optimize your numpy function by utilizing that 15 is divisible by both 3 and 5 as such:

def numpy_fizzbuzz(N: int) -> str:
    integers = np.arange(N)
    result = np.arange(N).astype(str)

    # Find indices in 'integers' that are divisible by 3 and 5    
    mod_3 = np.argwhere(integers % 3 == 0)
    mod_5 = np.argwhere(integers % 5 == 0)

    # If a number is divisible by 3 and 5, it is also divisible by 15
    mod_15 = np.intersect1d(mod_3, mod_5, assume_unique=True)
    
    # Update results at corresponding indices
    result[mod_3] = 'fizz'
    result[mod_5] = 'buzz'
    result[mod_15] = 'fizzbuzz'

    return " ".join(result)

However, this only leads to a speed-up of 18% on my platform.

timeit.timeit(lambda: classic_fizzbuzz(int(1e4)), number=1000) 
# 5.5598067000009905

timeit.timeit(lambda: numpy_fizzbuzz_old(1e4), number=1000)
# 15.149480800000674

timeit.timeit(lambda: numpy_fizzbuzz(1e4), number=1000)
# 12.509547700001349

Additionally converting the results to a list before joining yields more significant improvements.

# Using:
# return " ".join(result.tolist())
timeit.timeit(lambda: numpy_fizzbuzz_converted(1e4), number=1000)
# 7.723402100000385

Convert `result.tolist()` before `join`ing. – Michael Szczesny Apr 25 '22 at 10:45 — Michael Szczesny, Apr 25 '22 at 10:45

dawg · Accepted Answer · 2022-04-25T15:25:49.103

1

You can substantially improve the numpy version by using np.select rather than create a parallel array of strings:

def numpy_select(N: int) -> str:
    integers = np.arange(N)
    return ' '.join(np.select([(integers%3 == 0) & (integers%5 == 0), 
                      (integers%3 == 0), (integers%5 == 0)], 
                      ['fizzbuzz','fizz','buzz'], integers).tolist())

Your Classic Python version can be made a little faster like so:

def c2(N: int) -> str:
    return ' '.join(['fizzbuzz' if x%15==0 else 
                     'fizz' if x%3==0 else 
                     'buzz' if x%5==0 else str(x) for x in range(N)])

Plugging those into your benchmark:

edited Apr 25 '22 at 15:25

answered Apr 25 '22 at 11:20

dawg

98,345
23
131
206

Very nice solution. From [the docs](https://numpy.org/doc/stable/reference/generated/numpy.select.html) of np.select: `When multiple conditions are satisfied, the first one encountered in condlist is used.` I learned something new about the numpy API today :) – FirefoxMetzger Apr 25 '22 at 11:31
Also, what is missing from this answer is to concatenate the array into an actual string (so that it matches the output of the reference implementation), something like `" ".join(result.tolist())`. This adds about 20% runtime on my machine. – FirefoxMetzger Apr 25 '22 at 11:59

Accelerating FizzBuzz with numpy

2 Answers2