127

I do this:

a = 'hello'

And now I just want an independent copy of a:

import copy

b = str(a)
c = a[:]
d = a + ''
e = copy.copy(a)

map( id, [ a,b,c,d,e ] )

Out[3]:

[4365576160, 4365576160, 4365576160, 4365576160, 4365576160]

Why do they all have the same memory address and how can I get a copy of a?

usual me
  • 8,338
  • 10
  • 52
  • 95
  • 4
    To get answer different from Martijin's (which is entirely correct, though doesn't necessarily answer question as stated) you might want to provide more detail/use case to show **why** you want it copied. – elmo Jul 17 '14 at 14:10
  • 4
    As @elemo implies, this might be an [XY Problem](http://www.perlmonks.org/?node=XY+Problem). – martineau Jul 17 '14 at 14:39
  • 2
    I was interested in estimating the memory usage of a nested dictionary of the form `d[ 'hello' ] = e`, where `e[ 'hi' ] = 'again'`. To generate such a nested dictionary, I generated a single `e` dictionary and copied it multiple times. I noticed that the memory consumption was very low, which led to my question here. Now I understand that no string copies were created, hence the low memory consumption. – usual me Jul 17 '14 at 15:41
  • 1
    If you want `b` to be a modified version of `a` without modifying `a`, just let `b` be the result of whatever operation. e.g. `b = a[2:-1]` sets `b` to `'ll'` and `a` remains '`hello'`. – OJFord Jul 17 '14 at 22:30
  • Ollie is correct. This is because str is an immutable type. Due to python's use of singletons (and probably other internal optimizations), You won't see the memory expand like you expect when copying the e dictionary. – FizxMike Aug 27 '16 at 01:12

8 Answers8

189

You don't need to copy a Python string. They are immutable, and the copy module always returns the original in such cases, as do str(), the whole string slice, and concatenating with an empty string.

Moreover, your 'hello' string is interned (certain strings are). Python deliberately tries to keep just the one copy, as that makes dictionary lookups faster.

One way you could work around this is to actually create a new string, then slice that string back to the original content:

>>> a = 'hello'
>>> b = (a + '.')[:-1]
>>> id(a), id(b)
(4435312528, 4435312432)

But all you are doing now is waste memory. It is not as if you can mutate these string objects in any way, after all.

If all you wanted to know is how much memory a Python object requires, use sys.getsizeof(); it gives you the memory footprint of any Python object.

For containers this does not include the contents; you'd have to recurse into each container to calculate a total memory size:

>>> import sys
>>> a = 'hello'
>>> sys.getsizeof(a)
42
>>> b = {'foo': 'bar'}
>>> sys.getsizeof(b)
280
>>> sys.getsizeof(b) + sum(sys.getsizeof(k) + sys.getsizeof(v) for k, v in b.items())
360

You can then choose to use id() tracking to take an actual memory footprint or to estimate a maximum footprint if objects were not cached and reused.

Adam Porad
  • 14,193
  • 3
  • 31
  • 56
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 5
    There's more than only one way to create a new string object, such as `b = ''.join(a)`. – martineau Jul 17 '14 at 14:30
  • @martineau: sure, I really meant to say 'one way'. – Martijn Pieters Jul 17 '14 at 14:33
  • 11
    Emphasis on "You don't need to copy a Python string". There's a reason why those operations simply return the same string. – tcooc Jul 17 '14 at 14:52
  • 2
    In this case, though, the OP is attempting to waste memory. Since he wants to know how much memory will be used by a certain quantity of strings, that is the actual goal. Obviously he could generate unique strings, but that's just unnecessary work as a workaround. – Gabe Jul 17 '14 at 17:48
  • 9
    +1 for "casually" using an example that would output [42](http://en.wikipedia.org/wiki/42_%28number%29#The_Hitchhiker.27s_Guide_to_the_Galaxy). – Bakuriu Jul 17 '14 at 19:58
  • To everyone who is so unquestionably certain that one *never* has a need to copy a string: What if am trying to ensure that ('EUR', ''.join('EUR')) and ('EUR', 'EUR') to serialize to the exact same stream of bytes given a serialization format that will preserve identity (e.g. for arg sake pickle.dumps)? I can either 1. intern all possible strings 2. explicitly make a copy of all strings before serializing. It's reasonable in some use cases to want to do #2 - but "those who know better" have made this pretty difficult to do. – Paul Hollingsworth May 27 '20 at 11:04
  • @PaulHollingsworth: are we talking about the *optimisation* strategy where `pickle` sometimes uses object identity to minimize data transfer sizes? That's an implementation detail; don't rely on implementation details! – Martijn Pieters May 29 '20 at 20:32
21

I'm just starting some string manipulations and found this question. I was probably trying to do something like the OP, "usual me". The previous answers did not clear up my confusion, but after thinking a little about it I finally "got it".

As long as a, b, c, d, and e have the same value, they reference to the same place. Memory is saved. As soon as the variable start to have different values, they get start to have different references. My learning experience came from this code:

import copy
a = 'hello'
b = str(a)
c = a[:]
d = a + ''
e = copy.copy(a)

print map( id, [ a,b,c,d,e ] )

print a, b, c, d, e

e = a + 'something'
a = 'goodbye'
print map( id, [ a,b,c,d,e ] )
print a, b, c, d, e

The printed output is:

[4538504992, 4538504992, 4538504992, 4538504992, 4538504992]

hello hello hello hello hello

[6113502048, 4538504992, 4538504992, 4538504992, 5570935808]

goodbye hello hello hello hello something
Greenonline
  • 1,330
  • 8
  • 23
  • 31
karl s
  • 319
  • 2
  • 2
16

You can copy a string in python via string formatting :

>>> a = 'foo'  
>>> b = '%s' % a  
>>> id(a), id(b)  
(140595444686784, 140595444726400)  
Richard Urban
  • 339
  • 3
  • 7
  • 7
    Not true in Python 3.6.5. id(a) and id(b) are identical. The results are no different even when I used the modern version of format, viz., `b = '{:s}'.format(a)` – Seshadri R Aug 22 '18 at 09:42
8

To put it a different way "id()" is not what you care about. You want to know if the variable name can be modified without harming the source variable name.

>>> a = 'hello'                                                                                                                                                                                                                                                                                        
>>> b = a[:]                                                                                                                                                                                                                                                                                           
>>> c = a                                                                                                                                                                                                                                                                                              
>>> b += ' world'                                                                                                                                                                                                                                                                                      
>>> c += ', bye'                                                                                                                                                                                                                                                                                       
>>> a                                                                                                                                                                                                                                                                                                  
'hello'                                                                                                                                                                                                                                                                                                
>>> b                                                                                                                                                                                                                                                                                                  
'hello world'                                                                                                                                                                                                                                                                                          
>>> c                                                                                                                                                                                                                                                                                                  
'hello, bye'                                                                                                                                                                                                                                                                                           

If you're used to C, then these are like pointer variables except you can't de-reference them to modify what they point at, but id() will tell you where they currently point.

The problem for python programmers comes when you consider deeper structures like lists or dicts:

>>> o={'a': 10}                                                                                                                                                                                                                                                                                        
>>> x=o                                                                                                                                                                                                                                                                                                
>>> y=o.copy()                                                                                                                                                                                                                                                                                         
>>> x['a'] = 20                                                                                                                                                                                                                                                                                        
>>> y['a'] = 30                                                                                                                                                                                                                                                                                        
>>> o                                                                                                                                                                                                                                                                                                  
{'a': 20}                                                                                                                                                                                                                                                                                              
>>> x                                                                                                                                                                                                                                                                                                  
{'a': 20}                                                                                                                                                                                                                                                                                              
>>> y                                                                                                                                                                                                                                                                                                  
{'a': 30}                                                                                                                                                                                                                                                                                              

Here o and x refer to the same dict o['a'] and x['a'], and that dict is "mutable" in the sense that you can change the value for key 'a'. That's why "y" needs to be a copy and y['a'] can refer to something else.

Charles Thayer
  • 1,678
  • 1
  • 13
  • 17
5

It is possible, using this simple trick :

a = "Python"
b = a[ : : -1 ][ : : -1 ]
print( "a =" , a )
print( "b =" , b )
a == b  # True
id( a ) == id( b ) # False
just4qwerty
  • 51
  • 1
  • 2
2

As others have already explained, there's rarely an actual need for this, but nevertheless, here you go:
(works on Python 3, but there's probably something similar for Python 2)

import ctypes

copy          = ctypes.pythonapi._PyUnicode_Copy
copy.argtypes = [ctypes.py_object]
copy.restype  = ctypes.py_object

s1 = 'xxxxxxxxxxxxx'
s2 = copy(s1)

id(s1) == id(s2) # False
Eli Finkel
  • 463
  • 2
  • 13
1

Copying a string can be done two ways either copy the location a = "a" b = a or you can clone which means b wont get affected when a is changed which is done by a = 'a' b = a[:]

Thomas Youngson
  • 181
  • 3
  • 14
1

I think I have just solved this with string slicing.

a="words"
b=a[:int(len(a)/2)]+a[int(len(a)/2):]
b
'words'
id(a),id(b)
(1492873997808, 1492902431216)