2

First import some packages:

import numpy as np
from dask import delayed

Suppose I have two NumPy arrays:

a1 = np.ones(5000000)
a2 = np.ones(8000000)

I would like to show the sum and length of the two arrays, and the functions are shown as:

def sum(x):
  result = 0
  for data in x:
      result = result + data
  return result, len(x)

def get_result(x, y):
  return x, y

I have two examples in colab, the sequential example is like this:

%%time
result1 = sum(a1)
result2 = sum(a2)
result = get_result(result1, result2)
print(result)

And the output is:

((5000000.0, 5000000), (8000000.0, 8000000))
CPU times: user 1.41 s, sys: 3.7 ms, total: 1.42 s
Wall time: 1.42 s

However, I would like to compute these values parallelly.

result1 = delayed(sum)(a1)
result2 = delayed(sum)(a2)
result = delayed(get_result)(result1, result2)
result = result.compute()
print(result)

And the output is:

Delayed('get_result-ffbb6330-1014-42c5-b625-06e3e66a56ed')
CPU times: user 1.42 s, sys: 7.97 ms, total: 1.42 s
Wall time: 1.43 s

Why the second program didn't work parallelly? Because the wall time two examples are almost the same.

Liang Ce
  • 31
  • 4
  • 1
    The `sum` function you wrote is very inefficient. Please don't do that. Use `np.sum` instead. Trying to run in parallel very inefficient code just use more computing resources for no reason. For more information about why this is so inefficient, please read: https://stackoverflow.com/questions/69584027 . – Jérôme Richard Apr 07 '22 at 18:53
  • 1
    I know the program works worse than NumPy functions. it's just a demo. The code in Dask tutorial is similar to that in my question. The point is that programs cannot be executed in parallel. – Liang Ce Apr 08 '22 at 02:43

0 Answers0