1

I have two NumPy arrays with the same number of rows, but I want to add specific columns.

I tried the following:

src_array[:, 3] += column_array_to_add[:, 0]

However, that doesn't even interpret. What is the correct way to do this in NumPy? I want to be able to do it with both integers and strings.

Edit: A short, self-contained script for testing

import numpy
src = numpy.array([["a", "b"], ["c", "d"], ["e", "f"]])
src2 = numpy.array([["x"], ["y"], ["z"]])

src[:, 1] += src2[:, 0]
print src
exit()

This script returns the following error:

src[:, 1] += src2[:, 0]
TypeError: unsupported operand type(s) for +=: 'numpy.ndarray' and 'numpy.ndarray'
Jim
  • 4,509
  • 16
  • 50
  • 80
  • 1
    Is `column_array_to_add` another 2D array, or is it a 1D column array, as the name implies? If it's the former, the problem must be somewhere else in code you haven't shown us, because that line is valid, as Akavall demonstrates. If it's the latter, why are you trying to pass 2D indices into a 1D array? – abarnert Dec 06 '12 at 01:46
  • 1
    Either way, show us a [Short, Self Contained, Correct Example](http://sscce.org)—that is, give us enough of the code to run it ourselves and see the error. – abarnert Dec 06 '12 at 01:47
  • @abarnert Done, my apologies for not doing it sooner. – Jim Dec 06 '12 at 02:59
  • The data type of `src` is 'S1', which means there is only one byte available for each string in the array. This can not be changed in-place; once a numpy array is created, you can't change the size of the elements. So what you are trying to do will not work. – Warren Weckesser Dec 06 '12 at 05:07

2 Answers2

5

Does something like this work?

import numpy as np

x = np.array([[1,2],[3,4]])

y = np.array([[5,6],[7,8]])

result

>>> x
array([[1, 2],
       [3, 4]])
>>> y
array([[5, 6],
       [7, 8]])
>>> x[:,1] + y[:,1]
array([ 8, 12])
>>> x[:, 1] += y[:, 1] # using +=
>>> x[:, 1]
array([ 8, 12])

Update:

I think this should work for you:

src = np.array([["a", "b"], ["c", "d"], ["e", "f"]], dtype='|S8')
src2 = np.array([["x"], ["y"], ["z"]], dtype='|S8')

def add_columns(x, y):
    return [a + b for a,b in zip(x, y)]

def update_array(source_array, col_num, add_col):
    temp_col = add_columns(source_array[:, col_num], add_col)
    source_array[:, col_num] = temp_col  
    return source_array

Result:

>>> update_array(src, 1, src2[:,0])
array([['a', 'bx'],
       ['c', 'dy'],
       ['e', 'fz']], 
      dtype='|S8')
Akavall
  • 82,592
  • 51
  • 207
  • 251
  • +1, but I'd change the `+` to `+=` to show that his exact code is perfectly valid. – abarnert Dec 06 '12 at 01:46
  • @Akavall Are you sure this works with strings as well? I've added an example in my description. Thanks – Jim Dec 06 '12 at 02:59
  • I tried using `numpy.add()`, but it does not work for me. According to answers to this question, string manipulations should be done in pure python: http://stackoverflow.com/questions/9958506/element-wise-string-concatenation-in-numpy – Akavall Dec 06 '12 at 03:16
  • @Akavall Sure, but the data is already in the numpy array and that's out of my hands. – Jim Dec 06 '12 at 03:19
  • @Jim: You can always copy it out of the `numpy` array into pure Python. That's just as easy as copying it into a different `numpy` array with a different fixed-length string dtype. – abarnert Dec 06 '12 at 18:56
1

When you need to debug this kind of thing, it's useful to break it down into simpler steps. Are you getting the slices wrong, adding two incompatible array types, adding two types but trying to stick the results into an incompatible type (using += when + is OK but = is not), or adding incompatible data values? Any one of those could raise a TypeError, so how do we know which one you're doing?

Well, just do them at a time and see:

Slicing:

>>> src[:, 1]
array(['b', 'd', 'f'], dtype='|S1')
>>> src[:, 1] = ['x', 'y', 'z']
>>> src
>>> array([['a', 'x'], ['c', 'y'], ['e', 'z']], dtype='|S1')

That's fine. What about adding?

>>> src + src2
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'numpy.ndarray'

So, we've already found the same error as your more complicated case, without the slicing, and without the +=, which makes things much easier to debug. Let's make it even simpler:

>>> s1, s2 = np.array('a'), np.array('b')
>>> s1 + s2
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'numpy.ndarray'

So even adding two 0D arrays fails! Can't get any simpler than that.

Maybe it's the data types. What happens if we use integers?

>>> n1, n2 = np.array(1), np.array(2)
>>> n1 + n2
3

And you can go all the way back to your original example, using integers instead of strings, and it still works fine:

>>> m1 = np.array([[1,2], [3,4], [5,6]])
>>> m2 = np.array([[7], [8], [9]])
>>> m1[:, 1] += m2[:, 0]
>>> array([[ 1,  9],
           [ 3, 12],
           [ 5, 15]])

That should make it obvious that the problem is with data types. So, what is the data type? Just print out the array and see what numpy thinks it is:

>>> src = numpy.array([["a", "b"], ["c", "d"], ["e", "f"]])
>>> src
array([['a', 'b'], ['c', 'd'], ['e', 'f']], dtype='|S1')

That '|S1' isn't one of the friendly data types you see in the User Guide section on Data types, it's a structure definition, as explained in the section on Structured arrays. What it means is a 1-character fixed length string.

And that makes the problem obvious: You can't add two 1-character fixed-length strings, because the result isn't a 1-character fixed-length string.

If you really want to make this work as-is, the simple solution is to leave them as Python strings:

>>> src = numpy.array([["a", "b"], ["c", "d"], ["e", "f"]], dtype=object)
>>> src2 = numpy.array([["x"], ["y"], ["z"]], dtype=object)    
>>> src[:, 1] += src2[:, 0]

No more TypeError.

Alternatively, if you explicitly give src a dtype of |S2, numpy will allow that, and the second character will just be blank. It won't let you add another |S1 into it, but you can loop in Python, or maybe find a complicated way to fix numpy into doing it for you. Either way, you're not getting any of the usual time performance benefits of numpy of course, but you are still getting the space performance benefits of using packed fixed-size cells.

But you might want to step back and ask what you're trying to get out of numpy here. What is your actual higher-level goal here? Most of the benefit of numpy comes from using strict C/Fortran-style data types that numpy knows how to work with—it can pack them in tightly, access them without an extra dereference (and without refcounting), operate on in various ways from multiplying to copying to printing without any help from Python, etc. But it can't do string manipulation. If you're trying to vectorize string manipulation, you're using the wrong library to do it. If you're just using numpy because someone said it's fast, well, that's true in many cases, but not in this one. If you're using numpy because some other code is handing you numpy data, but you don't want to treat it in a numpy way, there's nothing stopping you from converting it to pure Python data.

abarnert
  • 354,177
  • 51
  • 601
  • 671