90

I just changed a program I am writing to hold my data as numpy arrays as I was having performance issues, and the difference was incredible. It originally took 30 minutes to run and now takes 2.5 seconds!

I was wondering how it does it. I assume it is that the because it removes the need for for loops but beyond that I am stumped.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
Anake
  • 7,201
  • 12
  • 45
  • 59
  • 6
    I'm guessing it's because numpy arrays are implemented in C rather than in Python. – Noufal Ibrahim Dec 05 '11 at 12:55
  • 16
    @NoufalIbrahim: Python lists are also [implemented in C](http://stackoverflow.com/questions/3917574/how-is-pythons-list-implemented/3958322#3958322). – Fred Foo Dec 05 '11 at 12:59
  • 15
    Pretty vague question without any indication of what the two different programs were doing and how they were implemented. – David Heffernan Dec 05 '11 at 13:02

6 Answers6

126

Numpy arrays are densely packed arrays of homogeneous type. Python lists, by contrast, are arrays of pointers to objects, even when all of them are of the same type. So, you get the benefits of locality of reference.

Also, many Numpy operations are implemented in C, avoiding the general cost of loops in Python, pointer indirection and per-element dynamic type checking. The speed boost depends on which operations you're performing, but a few orders of magnitude isn't uncommon in number crunching programs.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • 5
    How is it possible to offer Python front-end for these C-written operations? What is this technique named? –  Mar 04 '17 at 11:52
  • This cannot be true. Python lists are not arrays of pointers when the elements are primitive types, like integers. A quick way to test that is to save a number into a variable and form an array with that variable in it. If you change the variable, the array does not change. – Rohan Jun 02 '17 at 02:09
  • 17
    @Rohan Remember even primitive types are objects. So when you added that variable to the list, you are really just adding the object that particular variable points to to the list. In this case, this object is a number. So when you change the variable, or more precisely, rebinds the name to a new integer, you are not changing the properties of the original object, i.e., the original number. Hence it is expected that the 'corresponding' number in the array does not change its value. – Kun Jul 23 '17 at 01:09
  • @Kun so if I understand you correctly, if the value in the second list that is changed were not a primitive type, you are changing the contents of the "same" object, whereas if you change a primitive type, your are now referencing a different object? – IntegrateThis Dec 28 '20 at 05:45
  • @Rohan that's totally wrong. The test you propose wouldn't even demonstrate that. – juanpa.arrivillaga Oct 21 '21 at 00:02
27

numpy arrays are specialized data structures. This means you don't only get the benefits of an efficient in-memory representation, but efficient specialized implementations as well.

E.g. if you are summing up two arrays the addition will be performed with the specialized CPU vector operations, instead of calling the python implementation of int addition in a loop.

scrutari
  • 1,378
  • 2
  • 17
  • 33
riffraff
  • 2,429
  • 1
  • 23
  • 32
  • 2
    These (specialized operations and dynamic optimization) are the correct answers. Minor factors such as pre-fetching and locality of reference only become significant after the main performance factors (interpreter overhead) are addressed. – Dave Dec 05 '11 at 13:15
  • 4
    locality of reference is important for two reasons: because of the locality itself (and its effects on caching), and because a lack of indirection means that the instructions to process indirection can be skipped. – Karl Knechtel Dec 05 '11 at 14:13
  • Correct. One mechanism is Atlas (http://math-atlas.sourceforge.net/faq.html#what), which are specialized libraries that can use machine-specific instructions. – dfrankow Jul 02 '21 at 17:21
5

Consider the following code:

import numpy as np
import time

a = np.random.rand(1000000)
b = np.random.rand(1000000)

tic = time.time()
c = np.dot(a, b)
toc = time.time()

print("Vectorised version: " + str(1000*(toc-tic)) + "ms")

c = 0
tic = time.time()
for i in range(1000000):
    c += a[i] * b[i]
toc = time.time()

print("For loop: " + str(1000*(toc-tic)) + "ms")

Output:

Vectorised version: 2.011537551879883ms
For loop: 539.8685932159424ms

Here Numpy is much faster because it takes advantage of parallelism (which is the case of Single Instruction Multiple Data (SIMD)), while traditional for loop can't make use of it.

VinKrish
  • 357
  • 1
  • 5
  • 17
  • 1
    Please consider adding your code as text (using the code markup), as opposed to an image of your code. It makes your answer more accessible to readers. – Gavin May 02 '19 at 06:10
  • 3
    It seems to be unlikely that paralellism is the main reason for a 250x improvement. There aren't 250 CPU threads over which to parallelize. – Christian Mar 17 '20 at 14:08
  • 1
    This is the best fit explanation – Cozy Jun 12 '20 at 08:08
  • 3
    No, numpy does not make use low level parallelism (though a particular BLAS library may use it for `dot`.) The primary speed difference is due to compiled loops versus interpreted ones. – hpaulj Sep 18 '21 at 14:32
1

Numpy arrays are stored in memory as continuous blocks of memory and python lists are stored as small blocks which are scattered in memory so memory access is easy and fast in a numpy array and memory access is difficult and slow in a python list.

source: https://algorithmdotcpp.blogspot.com/2022/01/prove-numpy-is-faster-than-normal-list.html

  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/30797475) – Peter Leimbigler Jan 14 '22 at 17:56
0

Numpy arrays are extremily similar to 'normal' arrays such as those in c. Notice that every element has to be of the same type. The speedup is great because you can take advantage of prefetching and you can instantly access any element in array by it's index.

ScarletAmaranth
  • 5,065
  • 2
  • 23
  • 34
  • 2
    Could you elaborate on how having the same type for each element makes computations faster? – Rohan Jun 02 '17 at 02:10
0

You still have for loops, but they are done in c. Numpy is based on Atlas, which is a library for linear algebra operations.

http://math-atlas.sourceforge.net/

When facing a big computation, it will run tests using several implementations to find out which is the fastest one on our computer at this moment. With some numpy builds comutations may be parallelized on multiple cpus. So you will have highly optimized c running on continuous memory blocks.

Simon Bergot
  • 10,378
  • 7
  • 39
  • 55
  • 11
    Numpy isn't based on Atlas. It can use, if available, a BLAS implementation for a very, very small subset of its functionality (basically dot, gemv and gemm). That BLAS can be the built-in reference BLAS it ships with, or Atlas, or Intel MKL (the enthought distribution is built with this). – talonmies Dec 05 '11 at 13:16
  • @talonmies Hi, can you please provide some useful links that contain documentation about what you say ? – SebMa Jul 18 '18 at 13:09
  • 2
    @SebMa See https://numpy.org/install/, chapter "NumPy packages & accelerated linear algebra libraries". – Thomas Mar 07 '21 at 15:11