2

I am doing a school project and they've asked about the efficiency O(n) of some Numpy methods and I can't find them. Can anyone tell me where can I find those?

Example methods like:

numpy.linspace(x,y,z) numpy.meshgrid(x,y) numpy.zeroes(x,y)

Mahmoud Ayman
  • 197
  • 1
  • 13
  • 1
    I don't think there's a place where you can look them up. You will have to think what the algorithm actually does to infer its complexity. For example `np.ones(n)` has to allocate memory for n floats and then write a 1 in each of the memory cells. In total it has to write n ones, so you are in `O(n)`. – cel May 24 '15 at 20:36
  • Oh I see..but that's not applicable for meshgrid and linspace though... – Mahmoud Ayman May 24 '15 at 20:40

1 Answers1

3

You could simply measure the execution time for different problem sizes to get an estimate of the time complexity,

  • numpy.zeros(n) : non-deterministic
  • numpy.meshgrid(x,y) : O(n**2)
  • numpy.linspace(0, 1, n) : O(n**1.6)

For instance, below is a code to measure the time complexity for numpy.meshgrid(x,y), that can be used for other numpy functions as well,

In [1]: import numpy as np
   ...: from time import time
   ...: import matplotlib.pyplot as plt
   ...: from scipy.optimize import curve_fit
   ...: %matplotlib inline
   ...: 
   ...: def complexity_model(x, n, A, C):
   ...:     return A*x**n + C
   ...: 
   ...: problem_size = np.logspace(2, 4, 10)
   ...: 
   ...: res = []
   ...: for N in problem_size:
   ...:     x = np.linspace(0, 1, N)
   ...:     y = x.copy()
   ...:     
   ...:     t0 = time()
   ...:     np.meshgrid(x,y)
   ...:     dt = time() - t0
   ...:     res.append(dt)
   ...: 
   ...: nn = np.logspace(np.log10(problem_size.min()), np.log10(problem_size.max()), 100)  
   ...: 
   ...: time_to_solution = np.asarray(res)
   ...: fig, ax = plt.subplots(1,1)
   ...: ax.loglog(problem_size, time_to_solution, 'o-b')
   ...: 
   ...: mask = problem_size > 100 # ignore initial points
   ...: 
   ...: popt, _ = curve_fit(complexity_model, problem_size[mask],
   ...:                               time_to_solution[mask],
   ...:                               p0=(1.0, 1.0,  0.0) )
   ...: print(popt)
   ...: ax.loglog(nn, complexity_model(nn, *popt), '--k')
   ...: 
   ...: 
   ...: ax.set_xlabel('Problem size: N')
   ...: ax.set_ylabel('Time to solution 
  [  1.94816942e+00   1.40955397e-08  -7.33862899e-04]

which gives the following curve,

enter image description here

For sufficiently large array sizes, numpy.meshgrid(x,y) has thus a time complexity of O(n**α), with α = 1.95 ≈ 2.

rth
  • 10,680
  • 7
  • 53
  • 77
  • I thank you very much for this in-depth solution. Picked it as the right answer :) – Mahmoud Ayman May 25 '15 at 08:46
  • `numpy.zeros(n)`: `O(1)` seems suspicious. That would mean that you somehow can do better than filling each memory cell with a `0`. Are you sure that this is possible? – cel May 25 '15 at 11:09
  • @cel Yeah, I thought so too. Roughly, it should be the time complexity of corresponding memory allocation. However it seem that the time to execute a maloc is not deterministic (see this [answer](https://stackoverflow.com/a/398297/1791279) ) . For instance, `%timeit -n 1 np.zeros(100000)` gives anything between `30` and `160 µs` on my laptop. I updated the answer to reflect that. – rth May 25 '15 at 11:35
  • 1
    @rth, yes I think this is a limitation of the timing approach. In this case probably weird resource optimizations are happening that have nothing to do with the time complexity of the algorithm itself. – cel May 25 '15 at 11:41