I have a numpy structured array with a mixed dtype (i.e., floats, ints, and strings). I want to select some of the columns of the array (all of which contain only floats) and then get the sum, by column, of the rows, as a standard numpy array. The initial array takes a form comparable to:
some_data = np.array([('foo', 3.5, 2.15), ('bar', 2.8, 5.3), ('baz', 1.2, 3.7)],
dtype=[('col1', '<U20'), ('A', '<f8'), ('B', '<f8')])
For this example, I'd like to take the sum of columns A and B, yielding np.array([7.5, 11.15])
. With numpy ≤1.13, I could do that as follows:
get_cols = ['A', 'B']
desired_sum = np.sum(some_data[get_cols].view(('<f8', len(get_cols))), axis=0)
With the release of numpy 1.14, this method now fails with ValueError: Changing the dtype to a subarray type is only supported if the total itemsize is unchanged
, which is a result of the changes made in numpy 1.14 to the handling of structured arrays. (User bbengfort commented about the FutureWarning given about this change in this answer.)
In light of these changes to structured arrays, how can I obtain the desired sum from the structured array subset?