0

In the following code, I am attempting to calculate both the frequency and sum of a set of vectors (numpy vectors)

def calculate_means_on(the_labels, the_data):
    freq = dict();
    sums = dict();
    means = dict();
    total = 0;
    for index, a_label in enumerate(the_labels):
        this_data = the_data[index];
        if a_label not in freq:
            freq[a_label] = 1;
            sums[a_label] = this_data;
        else:
            freq[a_label] += 1;
            sums[a_label] += this_data;

Suppose the_data (a numpy 'matrix') is originally :

[[ 1.  2.  4.]
 [ 1.  2.  4.]
 [ 2.  1.  1.]
 [ 2.  1.  1.]
 [ 1.  1.  1.]]

After running the above code, the_data becomes:

[[  3.   6.  12.]
 [  1.   2.   4.]
 [  7.   4.   4.]
 [  2.   1.   1.]
 [  1.   1.   1.]]

Why is this? I've deduced it down to the line sums[a_label] += this_data; as when i change it to sums[a_label] = sums[a_label] + this_data; it behaves as expected; i.e., the_data is not modified.

Ulad Kasach
  • 11,558
  • 11
  • 61
  • 87

1 Answers1

4

This line:

this_data = the_data[index]

takes a view, not a copy, of a row of the_data. The view is backed by the original array, and mutating the view will write through to the original array.

This line:

sums[a_label] = this_data

inserts that view into the sums dict, and this line:

sums[a_label] += this_data

mutates the original array through the view, since += requests that the operation be performed by mutation instead of by creating a new object, when the object is mutable.

user2357112
  • 260,549
  • 28
  • 431
  • 505