0

I have a function that updates data in a cache (implemented as a dictionary).

def updateCache(name, new_data):
    global cache
    info = cache[name]  # is this being passed to me as a val or ref?
    datarows = info['datarows']
    datarows.append(new_data)

    # These will not be necessary if I was dealing with references ...?
    # or in other words, are the following statements redundant or required?
    info['datarows'] = datarows
    cache[name] = info

As the comment in the snippet enquires, do I need to stick the new updated objects back into the cache, or am I dealing with references to the objects stored in the dict - in which case the last two statements are redundant?

Homunculus Reticulli
  • 65,167
  • 81
  • 216
  • 341
  • @DavidZwicker: Because there may be some edge cases which could come back to haunt me, if I carry out a simplistic test - as a Python newbie, I don't want to make any such assumptions. Ofcourse, there is always the Python manuals ... but its much quicker to ask in here ... – Homunculus Reticulli Dec 29 '11 at 16:09
  • For a brief explanation on that matter, see http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#other-languages-have-variables which restricts itself to variables but applies to many other cases as well. –  Dec 29 '11 at 16:14
  • The last two lines are redundant if the objects you are updating are simple dictionaries but there are some rare situations where they could be required. For example if `cache` was a `shelve` object you would need the `cache[name] = info` line to tell the `shelve` that the object has changed. Similarly a persistent object stored in a Zope ZODB cannot tell when a non-persistent object that it contains has been mutated. – Duncan Dec 29 '11 at 16:28

6 Answers6

2

Let's just try this out, shall we:

def updateCache(name, new_data):
    global cache
    info = cache[name]
    datarows = info['datarows']
    datarows.append(new_data)


cache = {'foo': {'datarows': []}, 'bar': {'datarows': []}}

print cache
updateCache('foo', 'bar')
print cache

outputs:

{'foo': {'datarows': []}, 'bar': {'datarows': []}}
{'foo': {'datarows': ['bar']}, 'bar': {'datarows': []}}
Adam Wagner
  • 15,469
  • 7
  • 52
  • 66
1

or in other words, are the following statements redundant or required?

They are redundant, since almost everything in python is a reference. (Unless you don't play with some magic functions like __new__)

Constantinius
  • 34,183
  • 8
  • 77
  • 85
  • I don't see how `__new__`, which customized object creation, would give you something like a values type or otherwise fiddle with the assignment operator. In fact, I can't think of a single place where I *don't* work with references in Python... –  Dec 29 '11 at 16:07
1

There are actually three options. See this link for a description on the different copy flavors. See this post for a nice explanation for each of these.

  1. the variable info is a reference to the same dict object as cache[name]
  2. info holds a shallow copy of that dict
  3. info holds a deep copy of that dict

For the determination of which option applies in this case, you need to investigate the properties of your object. In this case you have a mutable container object (dict). Which implies that you assign the object reference to the new name, as in (equivalent to c = d = {}):

(Note that c = d = [] assigns the same object to both c and d.)

Community
  • 1
  • 1
moooeeeep
  • 31,622
  • 22
  • 98
  • 187
0

info has a value assigned to it. A quick experiment in a Python command line will confirm:

>>> cache = {'ProfSmiles':'LikesPython'}
>>> name = 'ProfSmiles'
>>> cache[name]
'LikesPython'

therefore in this case if we did info = cache[name], then type(info) would evaluate to <type 'str'>
So yes, the last two lines are redundant

Grace B
  • 1,386
  • 11
  • 30
  • 1
    You're showing nothing but the fact dictionaries work with string keys, which has nothing to do with the question. –  Dec 29 '11 at 16:08
0

If cache[name] evaluates to an instance of a value-type (numeric, string, or tuple) it will be passed by value. If it is a reference-type (dict, list, most other types), it will be passed by reference and your last two lines will be redundant. Of course, if the object is of a reference type but immutable, you'll have to do something else, like make a copy of it and replace the copy in the cache.

Edit: this is how I think about it, anyway. A possibly more technically correct explanation is that all types are passed by reference in Python, but some types like numerics, strings, and tuples are immutable (basically all hashable types), and therefore you can't make changes to the dereferenced object, and have to treat it as if it were pass-by-value.

Final edit: mutability is independent of hashability (though it would be odd to design a type that was immutable but unhashable), and it is mutability, not hashability, that matters in the OP's case. (Of course, name must be hashable to be a key in cache, but cache[name] could be anything.)

Maxy-B
  • 2,772
  • 1
  • 25
  • 33
  • There is no distinction between values types and reference types in Python! (Effectively, everything's a reference type if that helps you.) There are immutable types, and for them, the difference rarely matters. Still, thinking of value types clutters the understanding and is completely uncalled for. –  Dec 29 '11 at 16:16
  • @delnan: I edited my post to say exactly that, just as you were posting your comment :) – Maxy-B Dec 29 '11 at 16:17
  • Sorry to complain again, but hashable doesn't mean immutable (nor vice versa, technically). Some built-in - and, by default, *all* user-defined types - are hashed and compared by their identity, but *very* mutable. –  Dec 29 '11 at 16:19
  • @delnan: Don't worry, you haven't spoiled any of my notions. You would have if I had said that (x hashable iff x immutable), but I didn't. I merely offered a useful rule of thumb. – Maxy-B Dec 29 '11 at 16:22
  • Sadly, your rules of thumb are misleading at best. (I find it astonishing you understand how it works but offer these oversimplifications, but obviously, that's subjective.) –  Dec 29 '11 at 16:24
  • @delnan: You're right about user-defined types having a default `__hash__` method, so I guess my rule of thumb isn't so useful after all. – Maxy-B Dec 29 '11 at 16:26
0

As this answer to a related question explains:

  1. All variables contain references to an object
  2. All parameters receive references that are passed by value
  3. Changes will have an effect in the outside world only when a mutable object state is modified

Hence, what has to be taken into account is when some statement changes the state of an object or creates a new one.

Community
  • 1
  • 1
jcollado
  • 39,419
  • 8
  • 102
  • 133