I've recently been using python more and more in place of c/c++ because of it cuts my coding time by a factor of a few. At the same time, when I'm processing large amounts of data, the speed at which my python programs run starts to become a lot slower than in c. I'm wondering if this is due to me using large objects/arrays inefficiently. Is there any comprehensive guide just to how memory is handled by numpy/python? When things are passed by reference and when by value, when things are copied and when not, what types are mutable and which are not.
-
10"Factor of a few" is my new technical term for speaking with non-technical personnel about why we should switch to Python. – BlackVegetable Jul 26 '13 at 16:58
-
4[This post](http://stackoverflow.com/questions/986006/python-how-do-i-pass-a-variable-by-reference) has a godly amount of relevant data to this question... – jdero Jul 26 '13 at 17:01
-
@jdero That means it basically behaves identically to Java, correct? – BlackVegetable Jul 26 '13 at 17:02
-
1@BlackVegetable Correct. Disregard what jdero says. While primitives (and *only* primitives) are *really* passed by value, it is virtually not detectable, you can treat it as an optimization. – Jul 26 '13 at 17:05
-
Also relevant: [Facts and myths about Python names and values](http://nedbatchelder.com/text/names.html). – Jul 26 '13 at 17:06
-
@jdero thanks, that's actually one of the posts that motivated this one. If you look at the comments of the selected answer, and some of the other answers --- the selected answer doesn't actually give the correct explanations. I'm also hoping to find out what all are mutable types, and which aren't (or how to find out easily). – DilithiumMatrix Jul 26 '13 at 17:22
-
@zhermes The accepted answer is fine. Some people disagree over what would be the best name and mental model for how it works (they all know Python and get how it works), and a subset of those people turn that into downvotes and harsh comments. As for mutability: Don't kid yourself, there are more types than one can enumerate in one place. Besides, why do you care about mutability? I can think of only one case where it may affect performance of the same code (in-place operators) but I doubt you even knew of that. – Jul 26 '13 at 17:29
-
@delnan 'why do I care about mutability?' --- well it, determines if I can change an object or not. That's kind of important. Especially when producing an eventual modification then requires a copy operation. – DilithiumMatrix Jul 26 '13 at 17:34
-
1Suppose you don't know whether something is mutable, but you figure it out somehow. How are you going to mutate it? If you figured out it was mutable by figuring out it had a specific interface through which you could mutate it, then that's not really asking about mutability - that's a duck-typing type check. If all you learned is that it's mutable, you still have no idea what to do with it. – user2357112 Jul 26 '13 at 17:38
-
1@zhermes Okay, but that's the kind of thing you pick up while learning how to use the respective type in general. An up-front list won't do any good. And knowing that X is mutable doesn't do you any good if you don't know *how* to mutate it - or how to do *anything* with it for that matter. – Jul 26 '13 at 17:38
-
1A reasonable overview is the [Python C API reference](http://docs.python.org/3.3/c-api/index.html) or the [Extending Python with C](http://docs.python.org/3.3/extending/extending.html) reference. If you were going to write Numpy you would use this reference as a start. There are specific examples of what is called by reference and what by value from a C perspective. The vast majority (from a C perspective) are called by reference. From a Python perspective -- it does not matter. – dawg Jul 26 '13 at 18:07
2 Answers
Objects in python (and most mainstream languages) are passed as reference.
If we take numpy, for example, "new" arrays created by indexing existing ones are only views of the original. For example:
import numpy as np
>>> vec_1 = np.array([range(10)])
>>> vec_1
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> vec_2 = vec_1[3:] # let vec_2 be vec_1 from the third element untill the end
>>> vec_2
array([3, 4, 5, 6, 7, 8, 9])
>>> vec_2[3] = 10000
array([3, 4, 5, 10000, 7, 8, 9])
>>> vec_1
array([0, 1, 2, 3, 4, 5, 10000, 7, 8, 9])
Numpy have a handy method to help with your questions, called may_share_memory(obj1, obj2). So:
>>> np.may_share_memory(vec_1, vec_2)
True
Just be carefull, because it`s possible for the method to return false positives (Although i never saw one).
At SciPy 2013 there was a tutorial on numpy (http://conference.scipy.org/scipy2013/tutorial_detail.php?id=100). At the end the guy talks a little about how numpy handles memory. Watch it.
As a rule of thumb, objects are almost never passed as value by default. Even the ones encapsulated on another object. Another example, where a list makes a tour:
Class SomeClass():
def __init__(a_list):
self.inside_list = a_list
def get_list(self):
return self.inside_list
>>> original_list = range(5)
>>> original_list
[0,1,2,3,4]
>>> my_object = SomeClass(original_list)
>>> output_list = my_object.get_list()
>>> output_list
[0,1,2,3,4]
>>> output_list[4] = 10000
>>> output_list
[0,1,2,3,10000]
>>> my_object.original_list
[0,1,2,3,10000]
>>> original_list
[0,1,2,3,10000]
Creepy, huh? Using the assignment symbol ("="), or returning one in the end of a function you will always create a pointer to the object, or a portion of it. Objects are only duplicated when you explicitly do so, using a copy method like some_dict.copy, or array[:]. For example:
>>> original_list = range(5)
>>> original_list
[0,1,2,3,4]
>>> my_object = SomeClass(original_list[:])
>>> output_list = my_object.get_list()
>>> output_list
[0,1,2,3,4]
>>> output_list[4] = 10000
>>> output_list
[0,1,2,3,10000]
>>> my_object.original_list
[0,1,2,3,10000]
>>> original_list
[0,1,2,3,4]
Got it?

- 6,132
- 2
- 25
- 28
-
I think in your last example `my_object.original_list` should be `my_object.get_list()`. Also you may want to add how `vec2[:]` behaves compared to `vec2` when a value is assigned to them in your first example – kon psych Feb 25 '15 at 19:40
-
I think the [:] is copying only for list. If the original_list is an array, it will not be copied. – R Zhang Jan 11 '21 at 22:29
So i'm going to have to quote EOL on this because I think his answer is very relevant:
3) The last point is related to the question title: "passing by value" and "passing by reference" are not concepts that are relevant in Python. The relevant concepts are instead "mutable object" and "immutable object". Lists are mutable, while numbers are not, which explains what you observe. Also, your Person1 and bar1 objects are mutable (that's why you can change the person's age). You can find more information about these notions in a text tutorial and a video tutorial. Wikipedia also has some (more technical) information. An example illustrates the difference of behavior between mutable and immutable - answer by EOL
In general I've found Numpy/Scipy follow these; more importantly they tell you explicitly in the docs what is happening.
For example
np.random.shuffle
asks for an input array and returns None
while np.random.permutation
returns an array. You can clearly see which one returns a value versus doesn't here.
Simiarly arrays have pass-by-reference semantics and in general I find Numpy/Scipy
to be very efficient.
I think it's fair to say that if it's faster to use pass-by-reference
they will. As long as you use the functions the way the docs say, you shouldn't have significant problems with regards to speed.
is there any type in specific you are asking about?

- 1
- 1

- 4,481
- 7
- 32
- 41
-
Thanks for your answer. No, there aren't really specific types I was considering; I was more looking for an answer about general, optimal coding styles to employ for computational efficiency. I think this might not exist for python, besides trusting that numpy/scipy methods are already optimized. – DilithiumMatrix Aug 04 '13 at 16:21
-
Sorry in general python was made for ease not speed :). You can however write parts you want to be 'fast' in C and call them in Python to achieve fast runtimes, or there's always numpy/scipy as you say. Also compiling numpy/scipy for your specific build can help to optimize them further! – Eiyrioü von Kauyf Aug 04 '13 at 17:04